Re: [PATCH bpf-next v2 1/2] bpf: Add interval and tnum analysis for signed and unsigned BPF_DIV

BPF List
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Yazhou Tang <tangyazhou@zju.edu.cn>, bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
	andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
	song@kernel.org, kpsingh@kernel.org, sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org, tangyazhou518@outlook.com,
	shenghaoyuan0928@163.com, ziye@zju.edu.cn
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add interval and tnum analysis for signed and unsigned BPF_DIV
Date: Tue, 23 Dec 2025 12:58:58 -0800	[thread overview]
Message-ID: <db829499-1ad3-4d9f-8a89-6246938a45aa@linux.dev> (raw)
In-Reply-To: <20251223091120.2413435-2-tangyazhou@zju.edu.cn>



On 12/23/25 1:10 AM, Yazhou Tang wrote:
> From: Yazhou Tang <tangyazhou518@outlook.com>
>
> This patch introduces interval analysis (range tracking) and tnum
> analysis (bitwise tracking) for both signed and unsigned division
> operations in the BPF verifier.
>
> The BPF verifier currently lacks support for value tracking on BPF_DIV
> instructions, which can lead to false positives during verification of
> BPF programs that utilize division instructions.
>
> According to the BPF instruction set[1], the instruction's offset field
> (`insn->off`) is used to distinguish between signed (`off == 1`) and
> unsigned division (`off == 0`). Moreover, we also follow the BPF division
> semantics to handle special cases, such as division by zero and signed
> division overflow.
>
> - UDIV: dst = (src != 0) ? (dst / src) : 0
> - SDIV: dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst / src))
>
> Here is the overview of the changes made in this patch:
>
> 1. For interval analysis:
>     - Added `scalar_min_max_udiv` and `scalar32_min_max_udiv` to update
>       umin/umax bounds, which is straightforward.
>     - Added `scalar_min_max_sdiv` and `scalar32_min_max_sdiv` to update
>       smin/smax bounds. It handles non-monotonic intervals by decomposing
>       the divisor range into negative, zero, and positive sub-ranges, and
>       computing the result range for each sub-range separately. Finally,
>       it combines the results to get the final smin/smax bounds.
> 2. For tnum analysis, we referred to LLVM's KnownBits implementation[2]
>     and the recent research on abstract interpretation of division[3]:
>     - Added `tnum_udiv` to compute the tnum for unsigned division. It
>       calculates the maximum possible result based on the maximum values
>       of the dividend tnum and the minimum values of the divisor tnum,
>       then constructs the resulting tnum accordingly. We have prove its
>       soundness using Rocq Prover[4].
>     - Added `tnum_sdiv` to compute the tnum for signed division. It splits
>       the operands into positive and negative components, then performs
>       calculation on absolute values using `tnum_udiv`, finally unions
>       the results to ensure soundness.
>
>       Introduced `tnum_empty` to represent the bottom element (⊥) of the
>       tnum lattice to support the split-tnum analysis for signed division.
>       Mathematically, tnum_empty represents an empty set of possible  values,
>       which is crucial when dealing with sub-tnums (e.g., the negative
>       component of a strictly positive tnum) yield no valid values.
> 3. Also updated existing selftests based on the expected BPF_DIV behavior.

In my own experience, typical division (signed or unsigned) has remainder 0.
For example, (ptr1 - ptr2)/sizeof(*ptr1).

Do you have other production examples which needs more complex div/sdiv
handling in tnum and verifier? For example see:
    https://lore.kernel.org/bpf/aRYSlGmmQM1kfF_b@mail.gmail.com/

Without digging into details, a few comments below.

>
> [1] https://www.kernel.org/doc/Documentation/bpf/standardization/instruction-set.rst
> [2] https://llvm.org/doxygen/KnownBits_8cpp_source.html
> [3] https://dl.acm.org/doi/10.1145/3728905
> [4] https://github.com/shenghaoyuan/open-verified-artifacts/tree/main/tnum
>
> Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
> Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
> Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
> Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
> Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
> ---
> Hello everyone,
>
> Thanks for reviewing our patch! This patch adds interval and tnum analysis
> for both signed and unsigned BPF_DIV instructions in the BPF verifier.
>
> We also have implemented interval and tnum analysis for BPF_MOD
> instruction, which is closely related to division. However, to keep the
> patch size manageable and facilitate easier review, we have decided to
> submit the BPF_MOD related changes in a separate patch following this one.
>
> Best,
>
> Yazhou Tang
>
>   include/linux/tnum.h                          |   4 +
>   kernel/bpf/tnum.c                             | 159 ++++++++++++-
>   kernel/bpf/verifier.c                         | 225 ++++++++++++++++++
>   .../bpf/progs/verifier_value_illegal_alu.c    |   7 +-
>   4 files changed, 391 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/tnum.h b/include/linux/tnum.h
> index c52b862dad45..fd00deb2cb88 100644
> --- a/include/linux/tnum.h
> +++ b/include/linux/tnum.h
> @@ -50,6 +50,10 @@ struct tnum tnum_or(struct tnum a, struct tnum b);
>   struct tnum tnum_xor(struct tnum a, struct tnum b);
>   /* Multiply two tnums, return @a * @b */
>   struct tnum tnum_mul(struct tnum a, struct tnum b);
> +/* Unsigned division, return @a / @b */
> +struct tnum tnum_udiv(struct tnum a, struct tnum b);
> +/* Signed division, return @a / @b */
> +struct tnum tnum_sdiv(struct tnum a, struct tnum b, bool alu32);
>   
>   /* Return true if the known bits of both tnums have the same value */
>   bool tnum_overlap(struct tnum a, struct tnum b);
> diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> index f8e70e9c3998..20cd023709bf 100644
> --- a/kernel/bpf/tnum.c
> +++ b/kernel/bpf/tnum.c
> @@ -12,6 +12,13 @@
>   #define TNUM(_v, _m)	(struct tnum){.value = _v, .mask = _m}
>   /* A completely unknown value */
>   const struct tnum tnum_unknown = { .value = 0, .mask = -1 };
> +/* Not well-formed Tnum, whose concrete value is empty set. */
> +const struct tnum tnum_empty = { .value = -1, .mask = -1 };

Maybe tnum_empty renamed to tnum_poison? This will make it
explicit that it is not well formed.

> +
> +static bool __tnum_eqb(struct tnum a, struct tnum b)
> +{
> +	return a.value == b.value && a.mask == b.mask;
> +}
>   
>   struct tnum tnum_const(u64 value)
>   {
> @@ -83,9 +90,23 @@ struct tnum tnum_sub(struct tnum a, struct tnum b)
>   	return TNUM(dv & ~mu, mu);
>   }
>   
> +/* __tnum_neg_width: tnum negation with given bit width.
> + * @a: the tnum to be negated.
> + * @width: the bit width to perform negation, 32 or 64.
> + */
> +static struct tnum __tnum_neg_width(struct tnum a, int width)
> +{
> +	if (width == 32)
> +		return tnum_subreg(tnum_sub(TNUM(0, 0), tnum_subreg(a)));
> +	else if (width == 64)
> +		return tnum_sub(TNUM(0, 0), a);
> +	else
> +		return tnum_unknown;
> +}
> +
>   struct tnum tnum_neg(struct tnum a)
>   {
> -	return tnum_sub(TNUM(0, 0), a);
> +	return __tnum_neg_width(a, 64);
>   }
>   
>   struct tnum tnum_and(struct tnum a, struct tnum b)
> @@ -167,6 +188,138 @@ bool tnum_overlap(struct tnum a, struct tnum b)
>   	return (a.value & mu) == (b.value & mu);
>   }
>   
> +/* __get_mask: get a mask that covers all bits up to the highest set bit in x.
> + * For example:
> + *   x = 0b0000...0000 -> return 0b0000...0000
> + *   x = 0b0000...0001 -> return 0b0000...0001
> + *   x = 0b0000...1001 -> return 0b0000...1111
> + *   x = 0b1111...1111 -> return 0b1111...1111
> + */
> +static u64 __get_mask(u64 x)
> +{
> +	int width = 0;
> +
> +	if (x > 0)
> +		width = 64 - __builtin_clzll(x);

Maybe 'width = fls64(x)'?

> +	if (width == 0)
> +		return 0;
> +	else if (width == 64)
> +		return U64_MAX;
> +	else
> +		return (1ULL << width) - 1;
> +}
> +
> +struct tnum tnum_udiv(struct tnum a, struct tnum b)
> +{
> +	if (tnum_is_const(b)) {
> +		/* BPF div specification: x / 0 = 0 */
> +		if (b.value == 0)
> +			return TNUM(0, 0);
> +		if (tnum_is_const(a))
> +			return TNUM(a.value / b.value, 0);
> +	}
> +
> +	if (b.value == 0)
> +		return tnum_unknown;
> +
> +	u64 a_max = a.value + a.mask;
> +	u64 b_min = b.value;
> +	u64 max_res = a_max / b_min;
> +	return TNUM(0, __get_mask(max_res));
> +}
> +
> +static u64 __msb(u64 x, int width)
> +{
> +	return x & (1ULL << (width - 1));
> +}
> +
> +static struct tnum __tnum_get_positive(struct tnum x, int width)
> +{
> +	if (__msb(x.value, width))
> +		return tnum_empty;
> +	if (__msb(x.mask, width))
> +		return TNUM(x.value, x.mask & ~(1ULL << (width - 1)));
> +	return x;
> +}
> +
> +static struct tnum __tnum_get_negative(struct tnum x, int width)
> +{
> +	if (__msb(x.value, width))
> +		return x;
> +	if (__msb(x.mask, width))
> +		return TNUM(x.value | (1ULL << (width - 1)), x.mask & ~(1ULL << (width - 1)));
> +	return tnum_empty;
> +}
> +
> +static struct tnum __tnum_abs(struct tnum x, int width)
> +{
> +	if (__msb(x.value, width))
> +		return __tnum_neg_width(x, width);
> +	else
> +		return x;
> +}
> +
> +/* __tnum_sdiv, a helper for tnum_sdiv.
> + * @a: tnum a, a's sign is fixed, __msb(a.mask) == 0
> + * @b: tnum b, b's sign is fixed, __msb(b.mask) == 0
> + *
> + * This function reuses tnum_udiv by operating on the absolute values of a and b,
> + * and then adjusting the sign of the result based on C's division rules.
> + * Here we don't need to specially handle the case of [S64_MIN / -1], because
> + * after __tnum_abs, S64_MIN becomes (S64_MAX + 1), and the behavior of
> + * unsigned [(S64_MAX + 1) / 1] is normal.
> + */
> +static struct tnum __tnum_sdiv(struct tnum a, struct tnum b, int width)
> +{
> +	if (__tnum_eqb(a, tnum_empty) || __tnum_eqb(b, tnum_empty))
> +		return tnum_empty;
> +
> +	struct tnum a_abs = __tnum_abs(a, width);
> +	struct tnum b_abs = __tnum_abs(b, width);
> +	struct tnum res_abs = tnum_udiv(a_abs, b_abs);
> +
> +	if (__msb(a.value, width) == __msb(b.value, width))
> +		return res_abs;
> +	else
> +		return __tnum_neg_width(res_abs, width);
> +}
> +
> +struct tnum tnum_sdiv(struct tnum a, struct tnum b, bool alu32)
> +{
> +	if (tnum_is_const(b)) {
> +		/* BPF div specification: x / 0 = 0 */
> +		if (b.value == 0)
> +			return TNUM(0, 0);
> +		if (tnum_is_const(a)) {
> +			/* BPF div specification: S32_MIN / -1 = S32_MIN */
> +			if (alu32 && (u32)a.value == (u32)S32_MIN && (u32)b.value == (u32)-1)
> +				return TNUM((u32)S32_MIN, 0);
> +			/* BPF div specification: S64_MIN / -1 = S64_MIN */
> +			if (!alu32 && a.value == S64_MIN && b.value == (u64)-1)
> +				return TNUM((u64)S64_MIN, 0);
> +			s64 sval = (s64)a.value / (s64)b.value;
> +			return TNUM((u64)sval, 0);
> +		}
> +	}
> +
> +	if (b.value == 0)
> +		return tnum_unknown;
> +
> +	int width = alu32 ? 32 : 64;
> +	struct tnum a_pos = __tnum_get_positive(a, width);
> +	struct tnum a_neg = __tnum_get_negative(a, width);
> +	struct tnum b_pos = __tnum_get_positive(b, width);
> +	struct tnum b_neg = __tnum_get_negative(b, width);
> +
> +	struct tnum res_pos = __tnum_sdiv(a_pos, b_pos, width);
> +	struct tnum res_neg = __tnum_sdiv(a_neg, b_neg, width);
> +	struct tnum res_mix1 = __tnum_sdiv(a_pos, b_neg, width);
> +	struct tnum res_mix2 = __tnum_sdiv(a_neg, b_pos, width);
> +
> +	return tnum_union(tnum_union(res_pos, res_neg),
> +					tnum_union(res_mix1, res_mix2));
> +}

The above is very complex. It would be great if you can have
some examples to show the logic. For example, for this tnum patch:

https://lore.kernel.org/bpf/20250826034524.2159515-1-nandakumar@nandakumar.co.in

> +
>   /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
>    * a 'known 0' - this will return a 'known 1' for that bit.
>    */
> @@ -186,6 +339,10 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
>    */
>   struct tnum tnum_union(struct tnum a, struct tnum b)
>   {
> +	if (__tnum_eqb(a, tnum_empty))
> +		return b;
> +	if (__tnum_eqb(b, tnum_empty))
> +		return a;
>   	u64 v = a.value & b.value;
>   	u64 mu = (a.value ^ b.value) | a.mask | b.mask;

[...]

next prev parent reply	other threads:[~2025-12-23 20:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-23  9:10 [PATCH bpf-next v2 0/2] bpf: Add value tracking for BPF_DIV Yazhou Tang
2025-12-23  9:10 ` [PATCH bpf-next v2 1/2] bpf: Add interval and tnum analysis for signed and unsigned BPF_DIV Yazhou Tang
2025-12-23 20:58   ` Yonghong Song [this message]
2025-12-25  9:17     ` Yazhou Tang
2026-01-03  0:21       ` Alexei Starovoitov
2026-01-03  8:40         ` Yazhou Tang
2025-12-24  0:32   ` kernel test robot
2025-12-24 12:41   ` kernel test robot
2025-12-24 15:30   ` kernel test robot
2025-12-23  9:10 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add tests for BPF_DIV analysis Yazhou Tang
2025-12-23 13:50 ` [syzbot ci] Re: bpf: Add value tracking for BPF_DIV syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db829499-1ad3-4d9f-8a89-6246938a45aa@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=shenghaoyuan0928@163.com \
    --cc=song@kernel.org \
    --cc=tangyazhou518@outlook.com \
    --cc=tangyazhou@zju.edu.cn \
    --cc=ziye@zju.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox