[PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
@ 2026-03-03 19:27 Paul Chaignon
  2026-03-05  0:48 ` Ihor Solodrai
  2026-03-06  4:14 ` Shung-Hsi Yu
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Chaignon @ 2026-03-03 19:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Harishankar Vishwanathan

In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
added a new round of bounds deduction because two rounds were not enough
to converge to a fixed point. This commit slightly refactor the bounds
deduction logic such that two rounds are enough.

In [1], Eduard noticed that after we improved the refinement logic, a
third call to the bounds deduction (__reg_deduce_bounds) was needed to
converge to a fixed point. More specifically, we needed this third call
to improve the s64 range using the s32 range. We added the third call
and postponed a more detailed analysis of the refinement logic.

I've been looking into this more recently. To help, I wrote a high level
sequence of all the refinements carried out in reg_bounds_sync. u64 ->
s32 means we used the u64 ranges to improve the s32 ranges.

    /* __update_reg_bounds: */
    tnum -> {s32, u32, s64, u64}
    /* __reg_deduce_bounds: */
    for (3 times) {
        /* __reg32_deduce_bounds: */
        u64 -> {u32, s32}
        s64 -> {u32, s32}
        u64 -> s32
        s64 -> s32
        u32 -> s32
        s32 -> u32
        /* __reg64_deduce_bounds: */
        u64 -> s64
        s64 -> u64
        {u64, s64} -> {u64, s64}
        /* __reg_deduce_mixed_bounds: */
        u32 -> u64
        u32 -> s64
        {s32, s64} -> {s64, u64, tnum}
    }
    /* __reg_bound_offset: */
    {u64, u32} -> tnum
    /* __update_reg_bounds: */
    tnum -> {s32, u32, s64, u64}

From this, we can observe that we first improve the 32bit ranges from
the 64bit ranges in __reg32_deduce_bounds, then improve the 64bit
ranges on their own in __reg64_deduce_bounds. Intuitively, if we were to
improve the 64bit ranges on their own *before* we use them to improve
the 32bit ranges, we may reach a fixed point earlier. Said otherwise, we
would need to move the *64 -> *32 refinements to the beginning of
__reg_deduce_mixed_bounds. (The logic would then also better match the
function names, but that's secondary.)

After testing, this small refactoring does allow us to lose one call to
__reg_deduce_bounds. Without this refactoring, the test
"verifier_bounds/bounds deduction cross sign boundary, negative
overlap" fails when removing one call to __reg_deduce_bounds. This
patch therefore implements that bit of refactoring and reverts commit
5dbb19b16ac49 ("bpf: Add third round of bounds deduction").

As expected, this change didn't have any impact on the number of
instructions processed when running it through the Cilium complexity
test suite [2].

Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Link: https://pchaigno.github.io/test-verifier-complexity.html [2]
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
---
 kernel/bpf/verifier.c | 138 +++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 69 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fc4ccd1de569..55575f66aad5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2427,74 +2427,6 @@ static void __update_reg_bounds(struct bpf_reg_state *reg)
 /* Uses signed min/max values to inform unsigned, and vice-versa */
 static void __reg32_deduce_bounds(struct bpf_reg_state *reg)
 {
-	/* If upper 32 bits of u64/s64 range don't change, we can use lower 32
-	 * bits to improve our u32/s32 boundaries.
-	 *
-	 * E.g., the case where we have upper 32 bits as zero ([10, 20] in
-	 * u64) is pretty trivial, it's obvious that in u32 we'll also have
-	 * [10, 20] range. But this property holds for any 64-bit range as
-	 * long as upper 32 bits in that entire range of values stay the same.
-	 *
-	 * E.g., u64 range [0x10000000A, 0x10000000F] ([4294967306, 4294967311]
-	 * in decimal) has the same upper 32 bits throughout all the values in
-	 * that range. As such, lower 32 bits form a valid [0xA, 0xF] ([10, 15])
-	 * range.
-	 *
-	 * Note also, that [0xA, 0xF] is a valid range both in u32 and in s32,
-	 * following the rules outlined below about u64/s64 correspondence
-	 * (which equally applies to u32 vs s32 correspondence). In general it
-	 * depends on actual hexadecimal values of 32-bit range. They can form
-	 * only valid u32, or only valid s32 ranges in some cases.
-	 *
-	 * So we use all these insights to derive bounds for subregisters here.
-	 */
-	if ((reg->umin_value >> 32) == (reg->umax_value >> 32)) {
-		/* u64 to u32 casting preserves validity of low 32 bits as
-		 * a range, if upper 32 bits are the same
-		 */
-		reg->u32_min_value = max_t(u32, reg->u32_min_value, (u32)reg->umin_value);
-		reg->u32_max_value = min_t(u32, reg->u32_max_value, (u32)reg->umax_value);
-
-		if ((s32)reg->umin_value <= (s32)reg->umax_value) {
-			reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->umin_value);
-			reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->umax_value);
-		}
-	}
-	if ((reg->smin_value >> 32) == (reg->smax_value >> 32)) {
-		/* low 32 bits should form a proper u32 range */
-		if ((u32)reg->smin_value <= (u32)reg->smax_value) {
-			reg->u32_min_value = max_t(u32, reg->u32_min_value, (u32)reg->smin_value);
-			reg->u32_max_value = min_t(u32, reg->u32_max_value, (u32)reg->smax_value);
-		}
-		/* low 32 bits should form a proper s32 range */
-		if ((s32)reg->smin_value <= (s32)reg->smax_value) {
-			reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->smin_value);
-			reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->smax_value);
-		}
-	}
-	/* Special case where upper bits form a small sequence of two
-	 * sequential numbers (in 32-bit unsigned space, so 0xffffffff to
-	 * 0x00000000 is also valid), while lower bits form a proper s32 range
-	 * going from negative numbers to positive numbers. E.g., let's say we
-	 * have s64 range [-1, 1] ([0xffffffffffffffff, 0x0000000000000001]).
-	 * Possible s64 values are {-1, 0, 1} ({0xffffffffffffffff,
-	 * 0x0000000000000000, 0x00000000000001}). Ignoring upper 32 bits,
-	 * we still get a valid s32 range [-1, 1] ([0xffffffff, 0x00000001]).
-	 * Note that it doesn't have to be 0xffffffff going to 0x00000000 in
-	 * upper 32 bits. As a random example, s64 range
-	 * [0xfffffff0fffffff0; 0xfffffff100000010], forms a valid s32 range
-	 * [-16, 16] ([0xfffffff0; 0x00000010]) in its 32 bit subregister.
-	 */
-	if ((u32)(reg->umin_value >> 32) + 1 == (u32)(reg->umax_value >> 32) &&
-	    (s32)reg->umin_value < 0 && (s32)reg->umax_value >= 0) {
-		reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->umin_value);
-		reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->umax_value);
-	}
-	if ((u32)(reg->smin_value >> 32) + 1 == (u32)(reg->smax_value >> 32) &&
-	    (s32)reg->smin_value < 0 && (s32)reg->smax_value >= 0) {
-		reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->smin_value);
-		reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->smax_value);
-	}
 	/* if u32 range forms a valid s32 range (due to matching sign bit),
 	 * try to learn from that
 	 */
@@ -2649,6 +2581,75 @@ static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
 
 static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
 {
+	/* If upper 32 bits of u64/s64 range don't change, we can use lower 32
+	 * bits to improve our u32/s32 boundaries.
+	 *
+	 * E.g., the case where we have upper 32 bits as zero ([10, 20] in
+	 * u64) is pretty trivial, it's obvious that in u32 we'll also have
+	 * [10, 20] range. But this property holds for any 64-bit range as
+	 * long as upper 32 bits in that entire range of values stay the same.
+	 *
+	 * E.g., u64 range [0x10000000A, 0x10000000F] ([4294967306, 4294967311]
+	 * in decimal) has the same upper 32 bits throughout all the values in
+	 * that range. As such, lower 32 bits form a valid [0xA, 0xF] ([10, 15])
+	 * range.
+	 *
+	 * Note also, that [0xA, 0xF] is a valid range both in u32 and in s32,
+	 * following the rules outlined below about u64/s64 correspondence
+	 * (which equally applies to u32 vs s32 correspondence). In general it
+	 * depends on actual hexadecimal values of 32-bit range. They can form
+	 * only valid u32, or only valid s32 ranges in some cases.
+	 *
+	 * So we use all these insights to derive bounds for subregisters here.
+	 */
+	if ((reg->umin_value >> 32) == (reg->umax_value >> 32)) {
+		/* u64 to u32 casting preserves validity of low 32 bits as
+		 * a range, if upper 32 bits are the same
+		 */
+		reg->u32_min_value = max_t(u32, reg->u32_min_value, (u32)reg->umin_value);
+		reg->u32_max_value = min_t(u32, reg->u32_max_value, (u32)reg->umax_value);
+
+		if ((s32)reg->umin_value <= (s32)reg->umax_value) {
+			reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->umin_value);
+			reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->umax_value);
+		}
+	}
+	if ((reg->smin_value >> 32) == (reg->smax_value >> 32)) {
+		/* low 32 bits should form a proper u32 range */
+		if ((u32)reg->smin_value <= (u32)reg->smax_value) {
+			reg->u32_min_value = max_t(u32, reg->u32_min_value, (u32)reg->smin_value);
+			reg->u32_max_value = min_t(u32, reg->u32_max_value, (u32)reg->smax_value);
+		}
+		/* low 32 bits should form a proper s32 range */
+		if ((s32)reg->smin_value <= (s32)reg->smax_value) {
+			reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->smin_value);
+			reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->smax_value);
+		}
+	}
+	/* Special case where upper bits form a small sequence of two
+	 * sequential numbers (in 32-bit unsigned space, so 0xffffffff to
+	 * 0x00000000 is also valid), while lower bits form a proper s32 range
+	 * going from negative numbers to positive numbers. E.g., let's say we
+	 * have s64 range [-1, 1] ([0xffffffffffffffff, 0x0000000000000001]).
+	 * Possible s64 values are {-1, 0, 1} ({0xffffffffffffffff,
+	 * 0x0000000000000000, 0x00000000000001}). Ignoring upper 32 bits,
+	 * we still get a valid s32 range [-1, 1] ([0xffffffff, 0x00000001]).
+	 * Note that it doesn't have to be 0xffffffff going to 0x00000000 in
+	 * upper 32 bits. As a random example, s64 range
+	 * [0xfffffff0fffffff0; 0xfffffff100000010], forms a valid s32 range
+	 * [-16, 16] ([0xfffffff0; 0x00000010]) in its 32 bit subregister.
+	 */
+	if ((u32)(reg->umin_value >> 32) + 1 == (u32)(reg->umax_value >> 32) &&
+	    (s32)reg->umin_value < 0 && (s32)reg->umax_value >= 0) {
+		reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->umin_value);
+		reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->umax_value);
+	}
+	if ((u32)(reg->smin_value >> 32) + 1 == (u32)(reg->smax_value >> 32) &&
+	    (s32)reg->smin_value < 0 && (s32)reg->smax_value >= 0) {
+		reg->s32_min_value = max_t(s32, reg->s32_min_value, (s32)reg->smin_value);
+		reg->s32_max_value = min_t(s32, reg->s32_max_value, (s32)reg->smax_value);
+	}
+
 	/* Try to tighten 64-bit bounds from 32-bit knowledge, using 32-bit
 	 * values on both sides of 64-bit range in hope to have tighter range.
 	 * E.g., if r1 is [0x1'00000000, 0x3'80000000], and we learn from
@@ -2741,7 +2742,6 @@ static void reg_bounds_sync(struct bpf_reg_state *reg)
 	/* We might have learned something about the sign bit. */
 	__reg_deduce_bounds(reg);
 	__reg_deduce_bounds(reg);
-	__reg_deduce_bounds(reg);
 	/* We might have learned some bits from the bounds. */
 	__reg_bound_offset(reg);
 	/* Intersecting with the old var_off might have improved our bounds
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-03 19:27 [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction Paul Chaignon
@ 2026-03-05  0:48 ` Ihor Solodrai
  2026-03-05  6:54   ` Shung-Hsi Yu
  2026-03-06  4:14 ` Shung-Hsi Yu
  1 sibling, 1 reply; 12+ messages in thread
From: Ihor Solodrai @ 2026-03-05  0:48 UTC (permalink / raw)
  To: Paul Chaignon, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Harishankar Vishwanathan, Kernel Team

On 3/3/26 11:27 AM, Paul Chaignon wrote:
> In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
> added a new round of bounds deduction because two rounds were not enough
> to converge to a fixed point. This commit slightly refactor the bounds
> deduction logic such that two rounds are enough.
> 
> In [1], Eduard noticed that after we improved the refinement logic, a
> third call to the bounds deduction (__reg_deduce_bounds) was needed to
> converge to a fixed point. More specifically, we needed this third call
> to improve the s64 range using the s32 range. We added the third call
> and postponed a more detailed analysis of the refinement logic.
> 
> I've been looking into this more recently. To help, I wrote a high level
> sequence of all the refinements carried out in reg_bounds_sync. u64 ->
> s32 means we used the u64 ranges to improve the s32 ranges.
> 
>     /* __update_reg_bounds: */
>     tnum -> {s32, u32, s64, u64}
>     /* __reg_deduce_bounds: */
>     for (3 times) {
>         /* __reg32_deduce_bounds: */
>         u64 -> {u32, s32}
>         s64 -> {u32, s32}
>         u64 -> s32
>         s64 -> s32
>         u32 -> s32
>         s32 -> u32
>         /* __reg64_deduce_bounds: */
>         u64 -> s64
>         s64 -> u64
>         {u64, s64} -> {u64, s64}
>         /* __reg_deduce_mixed_bounds: */
>         u32 -> u64
>         u32 -> s64
>         {s32, s64} -> {s64, u64, tnum}
>     }
>     /* __reg_bound_offset: */
>     {u64, u32} -> tnum
>     /* __update_reg_bounds: */
>     tnum -> {s32, u32, s64, u64}
> 
> From this, we can observe that we first improve the 32bit ranges from
> the 64bit ranges in __reg32_deduce_bounds, then improve the 64bit
> ranges on their own in __reg64_deduce_bounds. Intuitively, if we were to
> improve the 64bit ranges on their own *before* we use them to improve
> the 32bit ranges, we may reach a fixed point earlier. Said otherwise, we
> would need to move the *64 -> *32 refinements to the beginning of
> __reg_deduce_mixed_bounds. (The logic would then also better match the
> function names, but that's secondary.)
> 
> After testing, this small refactoring does allow us to lose one call to
> __reg_deduce_bounds. Without this refactoring, the test
> "verifier_bounds/bounds deduction cross sign boundary, negative
> overlap" fails when removing one call to __reg_deduce_bounds. This
> patch therefore implements that bit of refactoring and reverts commit
> 5dbb19b16ac49 ("bpf: Add third round of bounds deduction").
> 
> As expected, this change didn't have any impact on the number of
> instructions processed when running it through the Cilium complexity
> test suite [2].
> 
> Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
> Link: https://pchaigno.github.io/test-verifier-complexity.html [2]
> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>


Hi Paul,

(trying to do my part with the code reviews and learn in the process)

Side note: you might be interested to know that Eduard is working on
consolidating signed and unsigned domains [1].

If I understand correctly from the pseudocode there is a dependency
loop between *64 -> *32 -> *64 -> *32 -> ... and so on.

This got me curious, why don't we actually try to compute a fixed
point here with a loop? If we were, we wouldn't have to guess how many
iterations need to be there, and the order of the narrowing steps.

So I implemented a dumb loop in reg_bounds_sync() with a limit, and
added global counters to collect stats. Pasting a (mostly vibe-coded,
sorry) patch at the bottom.

Then I ran veristat on a suite of Meta BPF programs against a kernel
with this loop (without your patch), here are the results:

  $ grep bounds_sync tools/testing/selftests/bpf/veristat.log | tail -n 10
  bounds_sync calls 12400636 total_iters 1349388 max_iters 2 avg_iters 0.10
  bounds_sync calls 12451162 total_iters 1355717 max_iters 2 avg_iters 0.10
  bounds_sync calls 12501688 total_iters 1362046 max_iters 2 avg_iters 0.10
  bounds_sync calls 12618333 total_iters 1377655 max_iters 2 avg_iters 0.10
  bounds_sync calls 12734978 total_iters 1393264 max_iters 2 avg_iters 0.10
  bounds_sync calls 12790555 total_iters 1398860 max_iters 2 avg_iters 0.10
  bounds_sync calls 12846132 total_iters 1404456 max_iters 2 avg_iters 0.10
  bounds_sync calls 12846174 total_iters 1404459 max_iters 2 avg_iters 0.10
  bounds_sync calls 12846320 total_iters 1404465 max_iters 2 avg_iters 0.10
  bounds_sync calls 12846321 total_iters 1404465 max_iters 2 avg_iters 0.10

  $ grep bounds_sync tools/testing/selftests/bpf/veristat.log | grep -o 'max_iters.*' | sort -u
  max_iters 2 avg_iters 0.09
  max_iters 2 avg_iters 0.10
  max_iters 2 avg_iters 0.11

This is across 400+ progs of various sizes. The vast majority of
reg_bounds_sync() calls don't change anything (0 iterations), and
those that do converge in max 2 iterations.

My interpretation of this (assuming I haven't messed up anything while
experimenting) is that the situation you hit in [2] is rare. And I think
it is safe to implement a loop in reg_bounds_sync() with a reasonable
iteration limit (10 maybe?).

Am I missing anything that makes it a bad idea?

[1] https://github.com/eddyz87/bpf/tree/cnum
[2] https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d92cf2821657..dc1082590f04 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -369,6 +369,11 @@ static const char *btf_type_name(const struct btf *btf, u32 id)
 static DEFINE_MUTEX(bpf_verifier_lock);
 static DEFINE_MUTEX(bpf_percpu_ma_lock);
 
+/* Global counters for reg_bounds_sync fixed-point convergence stats */
+static atomic64_t bounds_sync_total_iters = ATOMIC64_INIT(0);
+static atomic64_t bounds_sync_call_count = ATOMIC64_INIT(0);
+static atomic_t bounds_sync_max_iters = ATOMIC_INIT(0);
+
 __printf(2, 3) static void verbose(void *private_data, const char *fmt, ...)
 {
 	struct bpf_verifier_env *env = private_data;
@@ -2757,14 +2762,34 @@ static void __reg_bound_offset(struct bpf_reg_state *reg)
 	reg->var_off = tnum_or(tnum_clear_subreg(var64_off), var32_off);
 }
 
+#define REG_BOUNDS_MAX_ITERS 1000
+
 static void reg_bounds_sync(struct bpf_reg_state *reg)
 {
+	struct bpf_reg_state prev;
+	int iters = 0;
+	int cur_max;
+
 	/* We might have learned new bounds from the var_off. */
 	__update_reg_bounds(reg);
-	/* We might have learned something about the sign bit. */
-	__reg_deduce_bounds(reg);
-	__reg_deduce_bounds(reg);
-	__reg_deduce_bounds(reg);
+	/* We might have learned something about the sign bit.
+	 * Loop until bounds reach a fixed point.
+	 */
+	do {
+		prev = *reg;
+		__reg_deduce_bounds(reg);
+	} while (memcmp(&prev, reg, sizeof(prev)) && ++iters < REG_BOUNDS_MAX_ITERS);
+
+	WARN_ON_ONCE(iters >= REG_BOUNDS_MAX_ITERS);
+
+	atomic64_add(iters, &bounds_sync_total_iters);
+	atomic64_inc(&bounds_sync_call_count);
+	cur_max = atomic_read(&bounds_sync_max_iters);
+	while (iters > cur_max) {
+		if (atomic_try_cmpxchg(&bounds_sync_max_iters, &cur_max, iters))
+			break;
+	}
+
 	/* We might have learned some bits from the bounds. */
 	__reg_bound_offset(reg);
 	/* Intersecting with the old var_off might have improved our bounds
@@ -24799,6 +24824,20 @@ static void print_verification_stats(struct bpf_verifier_env *env)
 		env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
 		env->max_states_per_insn, env->total_states,
 		env->peak_states, env->longest_mark_read_walk);
+
+	{
+		u64 total = atomic64_read(&bounds_sync_total_iters);
+		u64 calls = atomic64_read(&bounds_sync_call_count);
+		u32 max = atomic_read(&bounds_sync_max_iters);
+
+		verbose(env, "bounds_sync calls %llu total_iters %llu max_iters %u",
+			calls, total, max);
+		if (calls)
+			verbose(env, " avg_iters %llu.%02llu",
+				div_u64(total, calls),
+				div_u64(total * 100, calls) % 100);
+		verbose(env, "\n");
+	}
 }
 
 int bpf_prog_ctx_arg_info_init(struct bpf_prog *prog,



> ---
>  kernel/bpf/verifier.c | 138 +++++++++++++++++++++---------------------
>  1 file changed, 69 insertions(+), 69 deletions(-)
> 
> [...]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-05  0:48 ` Ihor Solodrai
@ 2026-03-05  6:54   ` Shung-Hsi Yu
  2026-03-05 11:10     ` Eduard Zingerman
  2026-03-05 12:50     ` Paul Chaignon
  0 siblings, 2 replies; 12+ messages in thread
From: Shung-Hsi Yu @ 2026-03-05  6:54 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: Paul Chaignon, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Harishankar Vishwanathan,
	Kernel Team

On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
> On 3/3/26 11:27 AM, Paul Chaignon wrote:
> > In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
> > added a new round of bounds deduction because two rounds were not enough
> > to converge to a fixed point. This commit slightly refactor the bounds
> > deduction logic such that two rounds are enough.
> > 
> > In [1], Eduard noticed that after we improved the refinement logic, a
> > third call to the bounds deduction (__reg_deduce_bounds) was needed to
> > converge to a fixed point. More specifically, we needed this third call
> > to improve the s64 range using the s32 range. We added the third call
> > and postponed a more detailed analysis of the refinement logic.
> > 
> > I've been looking into this more recently. To help, I wrote a high level
> > sequence of all the refinements carried out in reg_bounds_sync. u64 ->
> > s32 means we used the u64 ranges to improve the s32 ranges.
> > 
> >     /* __update_reg_bounds: */
> >     tnum -> {s32, u32, s64, u64}
> >     /* __reg_deduce_bounds: */
> >     for (3 times) {
> >         /* __reg32_deduce_bounds: */
> >         u64 -> {u32, s32}
> >         s64 -> {u32, s32}
> >         u64 -> s32
> >         s64 -> s32
> >         u32 -> s32
> >         s32 -> u32
> >         /* __reg64_deduce_bounds: */
> >         u64 -> s64
> >         s64 -> u64
> >         {u64, s64} -> {u64, s64}
> >         /* __reg_deduce_mixed_bounds: */
> >         u32 -> u64
> >         u32 -> s64
> >         {s32, s64} -> {s64, u64, tnum}
> >     }
> >     /* __reg_bound_offset: */
> >     {u64, u32} -> tnum
> >     /* __update_reg_bounds: */
> >     tnum -> {s32, u32, s64, u64}
[...]
> Hi Paul,
> 
> (trying to do my part with the code reviews and learn in the process)

+1, still

> Side note: you might be interested to know that Eduard is working on
> consolidating signed and unsigned domains [1].

\o/

When I last look into this[1] there's was the "Interval Analysis and
Machine Arithmetic: Why Signedness Ignorance Is Bliss" paper[2] that
looks rather promising. And lately there was also "Program Analysis
Combining Generalized Bit-Level and Word-Level Abstractions"[3] (haven't
read) that seems more specific to BPF verifier.

> If I understand correctly from the pseudocode there is a dependency
> loop between *64 -> *32 -> *64 -> *32 -> ... and so on.
> 
> This got me curious, why don't we actually try to compute a fixed
> point here with a loop? If we were, we wouldn't have to guess how many
> iterations need to be there, and the order of the narrowing steps.
> 
> So I implemented a dumb loop in reg_bounds_sync() with a limit, and
> added global counters to collect stats. Pasting a (mostly vibe-coded,
> sorry) patch at the bottom.
[...]
>   $ grep bounds_sync tools/testing/selftests/bpf/veristat.log | grep -o 'max_iters.*' | sort -u
>   max_iters 2 avg_iters 0.09
>   max_iters 2 avg_iters 0.10
>   max_iters 2 avg_iters 0.11
> 
> This is across 400+ progs of various sizes. The vast majority of
> reg_bounds_sync() calls don't change anything (0 iterations), and
> those that do converge in max 2 iterations.
> 
> My interpretation of this (assuming I haven't messed up anything while
> experimenting) is that the situation you hit in [2] is rare. And I think
> it is safe to implement a loop in reg_bounds_sync() with a reasonable
> iteration limit (10 maybe?).
> 
> Am I missing anything that makes it a bad idea?

My (limited) understand that this would make verification of BPF
verifier with Agni[4] much harder, because it have to unrolls all
loops & branch based on the result. Having known number of calls and
no limited makes verification easy. Paul and Harishankar has a much
better understand there.

From a conceptual view, we're trading always the understanding of
fix-point convergence (or the potential to understand it); with a loop
subtle difference will be hidden away. Might be reasonable as a
security-hardening tactic, but seem to hurt the codebase on the long
run. That's my 2 cent.

1: https://lore.kernel.org/all/ZTZxoDJJbX9mrQ9w@u94a/
2: https://doi.org/10.1145/2651360
3: https://doi.org/10.1145/3728905
4: https://github.com/bpfverif/agni

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-05  6:54   ` Shung-Hsi Yu
@ 2026-03-05 11:10     ` Eduard Zingerman
  2026-03-05 13:15       ` Paul Chaignon
  2026-03-09  4:28       ` Shung-Hsi Yu
  2026-03-05 12:50     ` Paul Chaignon
  1 sibling, 2 replies; 12+ messages in thread
From: Eduard Zingerman @ 2026-03-05 11:10 UTC (permalink / raw)
  To: Shung-Hsi Yu, Ihor Solodrai
  Cc: Paul Chaignon, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harishankar Vishwanathan, Kernel Team

On Thu, 2026-03-05 at 14:54 +0800, Shung-Hsi Yu wrote:
> On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
> > On 3/3/26 11:27 AM, Paul Chaignon wrote:
> > > In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
> > > added a new round of bounds deduction because two rounds were not enough
> > > to converge to a fixed point. This commit slightly refactor the bounds
> > > deduction logic such that two rounds are enough.
> > > 
> > > In [1], Eduard noticed that after we improved the refinement logic, a
> > > third call to the bounds deduction (__reg_deduce_bounds) was needed to
> > > converge to a fixed point. More specifically, we needed this third call
> > > to improve the s64 range using the s32 range. We added the third call
> > > and postponed a more detailed analysis of the refinement logic.
> > > 
> > > I've been looking into this more recently. To help, I wrote a high level
> > > sequence of all the refinements carried out in reg_bounds_sync. u64 ->
> > > s32 means we used the u64 ranges to improve the s32 ranges.
> > > 
> > >     /* __update_reg_bounds: */
> > >     tnum -> {s32, u32, s64, u64}
> > >     /* __reg_deduce_bounds: */
> > >     for (3 times) {
> > >         /* __reg32_deduce_bounds: */
> > >         u64 -> {u32, s32}
> > >         s64 -> {u32, s32}
> > >         u64 -> s32
> > >         s64 -> s32
> > >         u32 -> s32
> > >         s32 -> u32
> > >         /* __reg64_deduce_bounds: */
> > >         u64 -> s64
> > >         s64 -> u64
> > >         {u64, s64} -> {u64, s64}
> > >         /* __reg_deduce_mixed_bounds: */
> > >         u32 -> u64
> > >         u32 -> s64
> > >         {s32, s64} -> {s64, u64, tnum}
> > >     }
> > >     /* __reg_bound_offset: */
> > >     {u64, u32} -> tnum
> > >     /* __update_reg_bounds: */
> > >     tnum -> {s32, u32, s64, u64}
> [...]
> > Hi Paul,
> > 
> > (trying to do my part with the code reviews and learn in the process)
> 
> +1, still
> 
> > Side note: you might be interested to know that Eduard is working on
> > consolidating signed and unsigned domains [1].
> 
> \o/
> 
> When I last look into this[1] there's was the "Interval Analysis and
> Machine Arithmetic: Why Signedness Ignorance Is Bliss" paper[2] that
> looks rather promising. And lately there was also "Program Analysis
> Combining Generalized Bit-Level and Word-Level Abstractions"[3] (haven't
> read) that seems more specific to BPF verifier.

Thank you for the links, I've only seen [1].
I was about to send [a], feeling really smug about cbmc tests.
But then a much simpler solution [b] occurred when figuring out why
64-bit test can't be written.

I'll probably continue playing with cnums at leisure pace.

[a] https://github.com/eddyz87/bpf/tree/cnum-sync-bounds
[b] https://github.com/eddyz87/bpf/tree/32-bit-range-overflow

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-05  6:54   ` Shung-Hsi Yu
  2026-03-05 11:10     ` Eduard Zingerman
@ 2026-03-05 12:50     ` Paul Chaignon
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Chaignon @ 2026-03-05 12:50 UTC (permalink / raw)
  To: Shung-Hsi Yu, Ihor Solodrai
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Harishankar Vishwanathan, Kernel Team

On Thu, Mar 05, 2026 at 02:54:07PM +0800, Shung-Hsi Yu wrote:
> On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
> > On 3/3/26 11:27 AM, Paul Chaignon wrote:
> > > In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
> > > added a new round of bounds deduction because two rounds were not enough
> > > to converge to a fixed point. This commit slightly refactor the bounds
> > > deduction logic such that two rounds are enough.
> > > 
> > > In [1], Eduard noticed that after we improved the refinement logic, a
> > > third call to the bounds deduction (__reg_deduce_bounds) was needed to
> > > converge to a fixed point. More specifically, we needed this third call
> > > to improve the s64 range using the s32 range. We added the third call
> > > and postponed a more detailed analysis of the refinement logic.
> > > 
> > > I've been looking into this more recently. To help, I wrote a high level
> > > sequence of all the refinements carried out in reg_bounds_sync. u64 ->
> > > s32 means we used the u64 ranges to improve the s32 ranges.
> > > 
> > >     /* __update_reg_bounds: */
> > >     tnum -> {s32, u32, s64, u64}
> > >     /* __reg_deduce_bounds: */
> > >     for (3 times) {
> > >         /* __reg32_deduce_bounds: */
> > >         u64 -> {u32, s32}
> > >         s64 -> {u32, s32}
> > >         u64 -> s32
> > >         s64 -> s32
> > >         u32 -> s32
> > >         s32 -> u32
> > >         /* __reg64_deduce_bounds: */
> > >         u64 -> s64
> > >         s64 -> u64
> > >         {u64, s64} -> {u64, s64}
> > >         /* __reg_deduce_mixed_bounds: */
> > >         u32 -> u64
> > >         u32 -> s64
> > >         {s32, s64} -> {s64, u64, tnum}
> > >     }
> > >     /* __reg_bound_offset: */
> > >     {u64, u32} -> tnum
> > >     /* __update_reg_bounds: */
> > >     tnum -> {s32, u32, s64, u64}

[...]

> > If I understand correctly from the pseudocode there is a dependency
> > loop between *64 -> *32 -> *64 -> *32 -> ... and so on.
> > 
> > This got me curious, why don't we actually try to compute a fixed
> > point here with a loop? If we were, we wouldn't have to guess how many
> > iterations need to be there, and the order of the narrowing steps.
> > 
> > So I implemented a dumb loop in reg_bounds_sync() with a limit, and
> > added global counters to collect stats. Pasting a (mostly vibe-coded,
> > sorry) patch at the bottom.
> [...]
> >   $ grep bounds_sync tools/testing/selftests/bpf/veristat.log | grep -o 'max_iters.*' | sort -u
> >   max_iters 2 avg_iters 0.09
> >   max_iters 2 avg_iters 0.10
> >   max_iters 2 avg_iters 0.11
> > 
> > This is across 400+ progs of various sizes. The vast majority of
> > reg_bounds_sync() calls don't change anything (0 iterations), and
> > those that do converge in max 2 iterations.
> > 
> > My interpretation of this (assuming I haven't messed up anything while
> > experimenting) is that the situation you hit in [2] is rare. And I think
> > it is safe to implement a loop in reg_bounds_sync() with a reasonable
> > iteration limit (10 maybe?).
> > 
> > Am I missing anything that makes it a bad idea?
> 
> My (limited) understand that this would make verification of BPF
> verifier with Agni[4] much harder, because it have to unrolls all
> loops & branch based on the result. Having known number of calls and
> no limited makes verification easy. Paul and Harishankar has a much
> better understand there.

That should be fine actually. Agni doesn't verify reg_bounds_sync as a
whole anymore. It follows a divide-and-conqueer approach and verifies
each subfunction independently [1]. So as long as the subfunctions are
individually sound, we should be fine and having a loop shouldn't even
impact Agni's runtime.

1 - https://lpc.events/event/18/contributions/1937/

> 
> From a conceptual view, we're trading always the understanding of
> fix-point convergence (or the potential to understand it); with a loop
> subtle difference will be hidden away. Might be reasonable as a
> security-hardening tactic, but seem to hurt the codebase on the long
> run. That's my 2 cent.

I think I agree with Shung-Hsi here. I'm not against using a loop to
ensure we always have the right number of iterations, but I wonder if
that would be over-engineering it a bit. And as Shung-Hsi pointed out,
once we have a loop, we lose the opportunity to identify optimizations
as I did here.

The one upside I see to the loop is avoiding unnecessary refinements
(the 0 iteration cases). But maybe there's an easier way to detect those
cases? And even then, do we care that much? I noticed the impact of one
more iteration is very small—I couldn't measure it on the runtime.

> 
> 1: https://lore.kernel.org/all/ZTZxoDJJbX9mrQ9w@u94a/
> 2: https://doi.org/10.1145/2651360
> 3: https://doi.org/10.1145/3728905
> 4: https://github.com/bpfverif/agni
> 
> [...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-05 11:10     ` Eduard Zingerman
@ 2026-03-05 13:15       ` Paul Chaignon
  2026-03-09  5:52         ` Shung-Hsi Yu
  2026-03-09  4:28       ` Shung-Hsi Yu
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Chaignon @ 2026-03-05 13:15 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Shung-Hsi Yu, Ihor Solodrai, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Harishankar Vishwanathan,
	Kernel Team

On Thu, Mar 05, 2026 at 03:10:00AM -0800, Eduard Zingerman wrote:
> On Thu, 2026-03-05 at 14:54 +0800, Shung-Hsi Yu wrote:
> > On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
> > > On 3/3/26 11:27 AM, Paul Chaignon wrote:

[...]

> > > Side note: you might be interested to know that Eduard is working on
> > > consolidating signed and unsigned domains [1].
> > 
> > \o/
> > 
> > When I last look into this[1] there's was the "Interval Analysis and
> > Machine Arithmetic: Why Signedness Ignorance Is Bliss" paper[2] that
> > looks rather promising. And lately there was also "Program Analysis
> > Combining Generalized Bit-Level and Word-Level Abstractions"[3] (haven't
> > read) that seems more specific to BPF verifier.
> 
> Thank you for the links, I've only seen [1].
> I was about to send [a], feeling really smug about cbmc tests.
> But then a much simpler solution [b] occurred when figuring out why
> 64-bit test can't be written.

If I understand [b] correctly, it's the 32-bits equivalent of
00bf8d0c6c9b ("bpf: Improve bounds when s64 crosses sign boundary"). I
didn't make the 32bits change back then, but it makes sense that we
would eventually need it :)

If you end up sending [b] as a patchset, note that we still have a
selftest for which we skip the invariant violation check:

  $ git grep -hB3 "\!BPF_F_TEST_REG_INVARIANTS" tools/testing/selftests/bpf/progs/verifier_bounds.c
  SEC("xdp")
  __description("bound check with JMP32_JSLT for crossing 32-bit signed boundary")
  __success __retval(0)
  __flag(!BPF_F_TEST_REG_INVARIANTS) /* known invariants violation */

I believe, with your patch, we would be able to invert that test flag. 

> 
> I'll probably continue playing with cnums at leisure pace.
> 
> [a] https://github.com/eddyz87/bpf/tree/cnum-sync-bounds
> [b] https://github.com/eddyz87/bpf/tree/32-bit-range-overflow

IIUC, your current patch doesn't maintain a cnum domain alongside the
existing abstract domain, but instead builds it only when needed, in
is_scalar_branch_taken. Would your long-term goal still be to replace
the existing four ranges with two cnum domains?

Either way, I believe it will require significant changes in Agni so
I'm interested to see where this goes :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-03 19:27 [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction Paul Chaignon
  2026-03-05  0:48 ` Ihor Solodrai
@ 2026-03-06  4:14 ` Shung-Hsi Yu
  2026-03-06 23:49   ` Paul Chaignon
  1 sibling, 1 reply; 12+ messages in thread
From: Shung-Hsi Yu @ 2026-03-06  4:14 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Harishankar Vishwanathan, Ihor Solodrai

On Tue, Mar 03, 2026 at 08:27:20PM +0100, Paul Chaignon wrote:
> In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
> added a new round of bounds deduction because two rounds were not enough
> to converge to a fixed point. This commit slightly refactor the bounds
> deduction logic such that two rounds are enough.
> 
> In [1], Eduard noticed that after we improved the refinement logic, a
> third call to the bounds deduction (__reg_deduce_bounds) was needed to
> converge to a fixed point. More specifically, we needed this third call
> to improve the s64 range using the s32 range. We added the third call
> and postponed a more detailed analysis of the refinement logic.
> 
> I've been looking into this more recently. To help, I wrote a high level
> sequence of all the refinements carried out in reg_bounds_sync. u64 ->
> s32 means we used the u64 ranges to improve the s32 ranges.
> 
>     /* __update_reg_bounds: */
>     tnum -> {s32, u32, s64, u64}
>     /* __reg_deduce_bounds: */
>     for (3 times) {
>         /* __reg32_deduce_bounds: */
>         u64 -> {u32, s32}
>         s64 -> {u32, s32}
>         u64 -> s32
>         s64 -> s32
>         u32 -> s32
>         s32 -> u32
>         /* __reg64_deduce_bounds: */
>         u64 -> s64
>         s64 -> u64
>         {u64, s64} -> {u64, s64}

Nit picking: my mind read the above with bashism and at first thought it
expands to the following:

        /* __reg64_deduce_bounds: */
        u64 -> s64
        s64 -> u64
        u64 -> u64
        u64 -> s64
        s64 -> u64
        s64 -> s64

But on a second look I do get that we're {u64, s64} means we're
combining knowledge in both u64 and s64.

>         /* __reg_deduce_mixed_bounds: */
>         u32 -> u64
>         u32 -> s64
>         {s32, s64} -> {s64, u64, tnum}
>     }
>     /* __reg_bound_offset: */
>     {u64, u32} -> tnum
>     /* __update_reg_bounds: */
>     tnum -> {s32, u32, s64, u64}
> 
> From this, we can observe that we first improve the 32bit ranges from
> the 64bit ranges in __reg32_deduce_bounds, then improve the 64bit
> ranges on their own in __reg64_deduce_bounds. Intuitively, if we were to
> improve the 64bit ranges on their own *before* we use them to improve
> the 32bit ranges, we may reach a fixed point earlier. Said otherwise, we
> would need to move the *64 -> *32 refinements to the beginning of
> __reg_deduce_mixed_bounds. (The logic would then also better match the
> function names, but that's secondary.)
> 
> After testing, this small refactoring does allow us to lose one call to
> __reg_deduce_bounds.
[...]

I poked at this patch with CBMC[1] because I was curious how much worse
off it might be compared to the old one, but the first counter-example
CBMC give me actually shows that it is possible for the new version to
perform better despite loosing one call to __reg_deduce_bounds.

Given that it is possible new version perform better (at least for one
case, with caveat that such case may not be representative) with one
less  __reg_deduce_bounds, and that we used to do just two rounds of
__reg_deduce_bounds anyway, I believe this patch is good.

Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>


The improvement seems to come from making sure *64 is the tightest
possible first (with __reg64_deduce_bounds) before using the knowledge
there to improve *32.

The counter-example that I got enters reg_bounds_sync with:
(I have no idea if this represents an actual register state that is
possible to encounter, and it is rather suspicious that that smin is a
value that is not within tnum/var_off)

  reg.var_off.value=2
  reg.var_off.mask=0x8000000000000001 (10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001)
  reg.smin_value=0xfffffffd80000000 (11111111 11111111 11111111 11111101 10000000 00000000 00000000 00000000)
  reg.smax_value=2
  reg.umin_value=2
  reg.umax_value=19
  reg.s32_min_value=2
  reg.s32_max_value=3
  reg.u32_min_value=2
  reg.u32_max_value=3

Where combination of umin & smax gives the tightest possible range of
u64=[2, 2]. Both old and new was able to arrive to this conclusion in
__reg64_deduce_bounds, but in the old version that range gets thrown
away in the final block of __reg_deduce_mixed_bounds:

  if (reg->s32_min_value >= 0 && reg->smin_value >= S32_MIN && reg->smax_value <= S32_MAX) {
    reg->smin_value = reg->s32_min_value;
    reg->smax_value = reg->s32_max_value;
    reg->umin_value = reg->s32_min_value;
    reg->umax_value = reg->s32_max_value;
    reg->var_off = tnum_intersect(reg->var_off,
                      tnum_range(reg->smin_value, reg->smax_value));
  }

Here the old version will put s32=[2, 3] into *64. Where as the new
version not affected because s32 was already at [2, 2] thanks to
__reg64_deduce_bounds coming before u64 -> {u32, s32}. The end result is
that in old version we get

  old_reg.var_off.value=2
  old_reg.var_off.mask=1
  old_reg.s32_min_value=2
  old_reg.s32_max_value=3
  old_reg.u32_min_value=2
  old_reg.u32_max_value=3
  old_reg.smin_value=2
  old_reg.smax_value=3
  old_reg.umin_value=2
  old_reg.umax_value=3

While after you patch is applied we get:

  new_reg.var_off.value=2ul
  new_reg.var_off.mask=0ul
  new_reg.s32_min_value=2
  new_reg.s32_max_value=2
  new_reg.u32_min_value=2
  new_reg.u32_max_value=2
  new_reg.smin_value=2
  new_reg.smax_value=2
  new_reg.umin_value=2
  new_reg.umax_value=2

Haven't check whether Eduard's latest u32/s32 bounds refinement patch
makes any difference. Likely not.


We also likely want to retain *64 knowledge final block of
__reg_deduce_mixed_bounds with min_t/max_t as precautionary measure, but
that is orthogonal to your patch here.


1: https://github.com/shunghsiyu/reg_bounds_sync-review/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-06  4:14 ` Shung-Hsi Yu
@ 2026-03-06 23:49   ` Paul Chaignon
  2026-03-09  5:27     ` Shung-Hsi Yu
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Chaignon @ 2026-03-06 23:49 UTC (permalink / raw)
  To: Shung-Hsi Yu
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Harishankar Vishwanathan, Ihor Solodrai

On Fri, Mar 06, 2026 at 12:14:01PM +0800, Shung-Hsi Yu wrote:
> On Tue, Mar 03, 2026 at 08:27:20PM +0100, Paul Chaignon wrote:
> > In commit 5dbb19b16ac49 ("bpf: Add third round of bounds deduction"), I
> > added a new round of bounds deduction because two rounds were not enough
> > to converge to a fixed point. This commit slightly refactor the bounds
> > deduction logic such that two rounds are enough.
> > 
> > In [1], Eduard noticed that after we improved the refinement logic, a
> > third call to the bounds deduction (__reg_deduce_bounds) was needed to
> > converge to a fixed point. More specifically, we needed this third call
> > to improve the s64 range using the s32 range. We added the third call
> > and postponed a more detailed analysis of the refinement logic.
> > 
> > I've been looking into this more recently. To help, I wrote a high level
> > sequence of all the refinements carried out in reg_bounds_sync. u64 ->
> > s32 means we used the u64 ranges to improve the s32 ranges.
> > 
> >     /* __update_reg_bounds: */
> >     tnum -> {s32, u32, s64, u64}
> >     /* __reg_deduce_bounds: */
> >     for (3 times) {
> >         /* __reg32_deduce_bounds: */
> >         u64 -> {u32, s32}
> >         s64 -> {u32, s32}
> >         u64 -> s32
> >         s64 -> s32
> >         u32 -> s32
> >         s32 -> u32
> >         /* __reg64_deduce_bounds: */
> >         u64 -> s64
> >         s64 -> u64
> >         {u64, s64} -> {u64, s64}
> 
> Nit picking: my mind read the above with bashism and at first thought it
> expands to the following:
> 
>         /* __reg64_deduce_bounds: */
>         u64 -> s64
>         s64 -> u64
>         u64 -> u64
>         u64 -> s64
>         s64 -> u64
>         s64 -> s64
> 
> But on a second look I do get that we're {u64, s64} means we're
> combining knowledge in both u64 and s64.

Yeah, it's not a very formal write up. I just used it to have a
high-level of the sequence of refinements. I'll switch this to u64+s64
in the v2; hopefully it's a bit clearer.

> 
> >         /* __reg_deduce_mixed_bounds: */
> >         u32 -> u64
> >         u32 -> s64
> >         {s32, s64} -> {s64, u64, tnum}
> >     }
> >     /* __reg_bound_offset: */
> >     {u64, u32} -> tnum
> >     /* __update_reg_bounds: */
> >     tnum -> {s32, u32, s64, u64}
> > 
> > From this, we can observe that we first improve the 32bit ranges from
> > the 64bit ranges in __reg32_deduce_bounds, then improve the 64bit
> > ranges on their own in __reg64_deduce_bounds. Intuitively, if we were to
> > improve the 64bit ranges on their own *before* we use them to improve
> > the 32bit ranges, we may reach a fixed point earlier. Said otherwise, we
> > would need to move the *64 -> *32 refinements to the beginning of
> > __reg_deduce_mixed_bounds. (The logic would then also better match the
> > function names, but that's secondary.)
> > 
> > After testing, this small refactoring does allow us to lose one call to
> > __reg_deduce_bounds.
> [...]
> 
> I poked at this patch with CBMC[1] because I was curious how much worse
> off it might be compared to the old one, but the first counter-example
> CBMC give me actually shows that it is possible for the new version to
> perform better despite loosing one call to __reg_deduce_bounds.
> 
> Given that it is possible new version perform better (at least for one
> case, with caveat that such case may not be representative) with one
> less  __reg_deduce_bounds, and that we used to do just two rounds of
> __reg_deduce_bounds anyway, I believe this patch is good.
> 
> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
> 
> 
> The improvement seems to come from making sure *64 is the tightest
> possible first (with __reg64_deduce_bounds) before using the knowledge
> there to improve *32.
> 
> The counter-example that I got enters reg_bounds_sync with:
> (I have no idea if this represents an actual register state that is
> possible to encounter, and it is rather suspicious that that smin is a
> value that is not within tnum/var_off)
> 
>   reg.var_off.value=2
>   reg.var_off.mask=0x8000000000000001 (10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001)
>   reg.smin_value=0xfffffffd80000000 (11111111 11111111 11111111 11111101 10000000 00000000 00000000 00000000)
>   reg.smax_value=2
>   reg.umin_value=2
>   reg.umax_value=19
>   reg.s32_min_value=2
>   reg.s32_max_value=3
>   reg.u32_min_value=2
>   reg.u32_max_value=3

Thanks a lot for the review and the detailed analysis! I also initially
doubted that it could be reproduced in a program, but gave it a try
anyway. The v2 includes a selftest that reproduces this (with a
slightly different value on smin). I added you as a co-author; please
give me a shout if I shouldn't have.

The tnum does look suspicious but my intuition was that it doesn't
actually matter to the reproduction. So it's just a matter of reaching
those *32 and *64 values. The v2 selftest details how to get to that,
but the difficulty was mostly to find the right sequence of bounds
checks to not trigger refinement too early.

So this demonstrates that the patch not only avoids one round of bounds
deduction, but does so while improving precision in some cases! I wasn't
expecting that. Thanks!

[...]

> Haven't check whether Eduard's latest u32/s32 bounds refinement patch
> makes any difference. Likely not.

Eduard's patch shouldn't make any difference as the u32 and s32 ranges
are equal and therefore one can't be used to improve the other.

> 
> 
> We also likely want to retain *64 knowledge final block of
> __reg_deduce_mixed_bounds with min_t/max_t as precautionary measure, but
> that is orthogonal to your patch here.
> 
> 
> 1: https://github.com/shunghsiyu/reg_bounds_sync-review/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-05 11:10     ` Eduard Zingerman
  2026-03-05 13:15       ` Paul Chaignon
@ 2026-03-09  4:28       ` Shung-Hsi Yu
  1 sibling, 0 replies; 12+ messages in thread
From: Shung-Hsi Yu @ 2026-03-09  4:28 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Ihor Solodrai, Paul Chaignon, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Harishankar Vishwanathan,
	Kernel Team

On Thu, Mar 05, 2026 at 03:10:00AM -0800, Eduard Zingerman wrote:
> On Thu, 2026-03-05 at 14:54 +0800, Shung-Hsi Yu wrote:
> > On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
[...]
> > > Side note: you might be interested to know that Eduard is working on
> > > consolidating signed and unsigned domains [1].
> > 
> > \o/
> > 
> > When I last look into this[1] there's was the "Interval Analysis and
> > Machine Arithmetic: Why Signedness Ignorance Is Bliss" paper[2] that
> > looks rather promising. And lately there was also "Program Analysis
> > Combining Generalized Bit-Level and Word-Level Abstractions"[3] (haven't
> > read) that seems more specific to BPF verifier.
> 
> Thank you for the links, I've only seen [1].
> I was about to send [a], feeling really smug about cbmc tests.
> But then a much simpler solution [b] occurred when figuring out why
> 64-bit test can't be written.
> 
> I'll probably continue playing with cnums at leisure pace.

LGTM, I like the cnum name, and cnum_intersect returning the more
precise of the two looks just like what should be done, I wasn't able to
figure that out.

I do hope the cbmc tests could make it into the kernel codebase, too[2].
srcu used to do that, but it got removed some time ago.

I was able to ask Paul McKenney asked about the removal back at LPC
2025, and IIRC the problem wasn't with cbmc itself[1], but rather the
code-extraction script wasn't able to keep up with the source code. I
think that should be less of problem for cnum.

1: It was mentioned cbmc would take a very long time to run for complex
   code, which is innate to anything SAT/SMT-based
2: Elephant in the room being -- Agni already doing very well; but being
   able to write these kinds of tests with just C still seems benefitial

> [a] https://github.com/eddyz87/bpf/tree/cnum-sync-bounds
> [b] https://github.com/eddyz87/bpf/tree/32-bit-range-overflow

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-06 23:49   ` Paul Chaignon
@ 2026-03-09  5:27     ` Shung-Hsi Yu
  0 siblings, 0 replies; 12+ messages in thread
From: Shung-Hsi Yu @ 2026-03-09  5:27 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Harishankar Vishwanathan, Ihor Solodrai

On Sat, Mar 07, 2026 at 12:49:42AM +0100, Paul Chaignon wrote:
> On Fri, Mar 06, 2026 at 12:14:01PM +0800, Shung-Hsi Yu wrote:
> > On Tue, Mar 03, 2026 at 08:27:20PM +0100, Paul Chaignon wrote:
[...]
> Thanks a lot for the review and the detailed analysis! I also initially
> doubted that it could be reproduced in a program, but gave it a try
> anyway. The v2 includes a selftest that reproduces this (with a
> slightly different value on smin). I added you as a co-author; please
> give me a shout if I shouldn't have.

Glad to be added!

> The tnum does look suspicious but my intuition was that it doesn't
> actually matter to the reproduction. So it's just a matter of reaching
> those *32 and *64 values. The v2 selftest details how to get to that,
> but the difficulty was mostly to find the right sequence of bounds
> checks to not trigger refinement too early.
> 
> So this demonstrates that the patch not only avoids one round of bounds
> deduction, but does so while improving precision in some cases! I wasn't
> expecting that. Thanks!

:D

> [...]
> 
> > Haven't check whether Eduard's latest u32/s32 bounds refinement patch
> > makes any difference. Likely not.
> 
> Eduard's patch shouldn't make any difference as the u32 and s32 ranges
> are equal and therefore one can't be used to improve the other.

Oh indeed. That was a surprisingly good counter example I got.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-05 13:15       ` Paul Chaignon
@ 2026-03-09  5:52         ` Shung-Hsi Yu
  2026-03-09 11:09           ` Paul Chaignon
  0 siblings, 1 reply; 12+ messages in thread
From: Shung-Hsi Yu @ 2026-03-09  5:52 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Paul Chaignon, Andrea Righi, Emil Tsalapatis, Ihor Solodrai, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Harishankar Vishwanathan, Kernel Team

Cc Andrea and Emil from the other thread:

  https://lore.kernel.org/all/20260220190736.230329-1-arighi@nvidia.com/t/#u

On Thu, Mar 05, 2026 at 02:15:10PM +0100, Paul Chaignon wrote:
> On Thu, Mar 05, 2026 at 03:10:00AM -0800, Eduard Zingerman wrote:
> > On Thu, 2026-03-05 at 14:54 +0800, Shung-Hsi Yu wrote:
> > > On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
> > > > On 3/3/26 11:27 AM, Paul Chaignon wrote:
[...]
> > I'll probably continue playing with cnums at leisure pace.
> > 
> > [a] https://github.com/eddyz87/bpf/tree/cnum-sync-bounds
> > [b] https://github.com/eddyz87/bpf/tree/32-bit-range-overflow
> 
> IIUC, your current patch doesn't maintain a cnum domain alongside the
> existing abstract domain, but instead builds it only when needed, in
> is_scalar_branch_taken. Would your long-term goal still be to replace
> the existing four ranges with two cnum domains?

Borrowing upon the above, would like to about ask about the work on
"avoiding inconsistent state with is_scalar_branch_taken vs
reg_set_min_max difference". Is the current plan to address that once we
have the cnum domains?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction
  2026-03-09  5:52         ` Shung-Hsi Yu
@ 2026-03-09 11:09           ` Paul Chaignon
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Chaignon @ 2026-03-09 11:09 UTC (permalink / raw)
  To: Shung-Hsi Yu
  Cc: Eduard Zingerman, Andrea Righi, Emil Tsalapatis, Ihor Solodrai,
	bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Harishankar Vishwanathan, Kernel Team

On Mon, Mar 09, 2026 at 01:52:47PM +0800, Shung-Hsi Yu wrote:
> Cc Andrea and Emil from the other thread:
> 
>   https://lore.kernel.org/all/20260220190736.230329-1-arighi@nvidia.com/t/#u
> 
> On Thu, Mar 05, 2026 at 02:15:10PM +0100, Paul Chaignon wrote:
> > On Thu, Mar 05, 2026 at 03:10:00AM -0800, Eduard Zingerman wrote:
> > > On Thu, 2026-03-05 at 14:54 +0800, Shung-Hsi Yu wrote:
> > > > On Wed, Mar 04, 2026 at 04:48:43PM -0800, Ihor Solodrai wrote:
> > > > > On 3/3/26 11:27 AM, Paul Chaignon wrote:
> [...]
> > > I'll probably continue playing with cnums at leisure pace.
> > > 
> > > [a] https://github.com/eddyz87/bpf/tree/cnum-sync-bounds
> > > [b] https://github.com/eddyz87/bpf/tree/32-bit-range-overflow
> > 
> > IIUC, your current patch doesn't maintain a cnum domain alongside the
> > existing abstract domain, but instead builds it only when needed, in
> > is_scalar_branch_taken. Would your long-term goal still be to replace
> > the existing four ranges with two cnum domains?
> 
> Borrowing upon the above, would like to about ask about the work on
> "avoiding inconsistent state with is_scalar_branch_taken vs
> reg_set_min_max difference". Is the current plan to address that once we
> have the cnum domains?

IMO, the two are fairly orthogonal. If cnum is merged first, then it may
help reduce invariant violations a bit (?). If it's merged in second, it
will probably improve is_branch_taken's accuracy a bit.

We're preparing a patchset with Hari (cc'ed) to use regs_refine_cond_op
for is_branch_taken. It should fix the invariant violations once and
for all—they're a pretty big syzbot sink at the moment for BPF :(
I'm currently collecting test cases from new syzbot reproducers and will
send it afterwards.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-09 11:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 19:27 [PATCH bpf-next 1/1] bpf: Avoid one round of bounds deduction Paul Chaignon
2026-03-05  0:48 ` Ihor Solodrai
2026-03-05  6:54   ` Shung-Hsi Yu
2026-03-05 11:10     ` Eduard Zingerman
2026-03-05 13:15       ` Paul Chaignon
2026-03-09  5:52         ` Shung-Hsi Yu
2026-03-09 11:09           ` Paul Chaignon
2026-03-09  4:28       ` Shung-Hsi Yu
2026-03-05 12:50     ` Paul Chaignon
2026-03-06  4:14 ` Shung-Hsi Yu
2026-03-06 23:49   ` Paul Chaignon
2026-03-09  5:27     ` Shung-Hsi Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox