[PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
@ 2025-08-22 17:08 Nandakumar Edamana
  2025-08-22 17:08 ` [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication Nandakumar Edamana
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Nandakumar Edamana @ 2025-08-22 17:08 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan, Nandakumar Edamana

This commit addresses a challenge explained in an open question ("How
can we incorporate correlation in unknown bits across partial
products?") left by Harishankar et al. in their paper:
https://arxiv.org/abs/2105.05398

When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
from which we could find two possible partial products and take a
union. Experiment shows that applying this technique in long
multiplication improves the precision in a significant number of cases
(at the cost of losing precision in a relatively lower number of
cases).

This commit also removes the value-mask decomposition technique
employed by Harishankar et al., as its direct incorporation did not
result in any improvements for the new algorithm.

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
---
 include/linux/tnum.h |  3 +++
 kernel/bpf/tnum.c    | 47 ++++++++++++++++++++++++++++++++------------
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/include/linux/tnum.h b/include/linux/tnum.h
index 57ed3035cc30..68e9cdd0a2ab 100644
--- a/include/linux/tnum.h
+++ b/include/linux/tnum.h
@@ -54,6 +54,9 @@ struct tnum tnum_mul(struct tnum a, struct tnum b);
 /* Return a tnum representing numbers satisfying both @a and @b */
 struct tnum tnum_intersect(struct tnum a, struct tnum b);
 
+/* Returns a tnum representing numbers satisfying either @a or @b */
+struct tnum tnum_union(struct tnum t1, struct tnum t2);
+
 /* Return @a with all but the lowest @size bytes cleared */
 struct tnum tnum_cast(struct tnum a, u8 size);
 
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
index fa353c5d550f..50e4d34d4774 100644
--- a/kernel/bpf/tnum.c
+++ b/kernel/bpf/tnum.c
@@ -116,31 +116,39 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
 	return TNUM(v & ~mu, mu);
 }
 
-/* Generate partial products by multiplying each bit in the multiplier (tnum a)
- * with the multiplicand (tnum b), and add the partial products after
- * appropriately bit-shifting them. Instead of directly performing tnum addition
- * on the generated partial products, equivalenty, decompose each partial
- * product into two tnums, consisting of the value-sum (acc_v) and the
- * mask-sum (acc_m) and then perform tnum addition on them. The following paper
- * explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.
+/* Perform long multiplication, iterating through the trits in a.
+ * Inside `else if (a.mask & 1)`, instead of simply multiplying b with LSB(a)'s
+ * uncertainty and accumulating directly, we find two possible partial products
+ * (one for LSB(a) = 0 and another for LSB(a) = 1), and add their union to the
+ * accumulator. This addresses an issue pointed out in an open question ("How
+ * can we incorporate correlation in unknown bits across partial products?")
+ * left by Harishankar et al. (https://arxiv.org/abs/2105.05398), improving
+ * the general precision significantly.
  */
 struct tnum tnum_mul(struct tnum a, struct tnum b)
 {
-	u64 acc_v = a.value * b.value;
-	struct tnum acc_m = TNUM(0, 0);
+	struct tnum acc = TNUM(0, 0);
 
 	while (a.value || a.mask) {
 		/* LSB of tnum a is a certain 1 */
 		if (a.value & 1)
-			acc_m = tnum_add(acc_m, TNUM(0, b.mask));
+			acc = tnum_add(acc, b);
 		/* LSB of tnum a is uncertain */
-		else if (a.mask & 1)
-			acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
+		else if (a.mask & 1) {
+			/* acc = tnum_union(acc_0, acc_1), where acc_0 and
+			 * acc_1 are partial accumulators for cases
+			 * LSB(a) = certain 0 and LSB(a) = certain 1.
+			 * acc_0 = acc + 0 * b = acc.
+			 * acc_1 = acc + 1 * b = tnum_add(acc, b).
+			 */
+
+			acc = tnum_union(acc, tnum_add(acc, b));
+		}
 		/* Note: no case for LSB is certain 0 */
 		a = tnum_rshift(a, 1);
 		b = tnum_lshift(b, 1);
 	}
-	return tnum_add(TNUM(acc_v, 0), acc_m);
+	return acc;
 }
 
 /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
@@ -155,6 +163,19 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
 	return TNUM(v & ~mu, mu);
 }
 
+/* Returns a tnum with the uncertainty from both a and b, and in addition, new
+ * uncertainty at any position that a and b disagree. This represents a
+ * superset of the union of the concrete sets of both a and b. Despite the
+ * overapproximation, it is optimal.
+ */
+struct tnum tnum_union(struct tnum a, struct tnum b)
+{
+	u64 v = a.value & b.value;
+	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
+
+	return TNUM(v & ~mu, mu);
+}
+
 struct tnum tnum_cast(struct tnum a, u8 size)
 {
 	a.value &= (1ULL << (size * 8)) - 1;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication
  2025-08-22 17:08 [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Nandakumar Edamana
@ 2025-08-22 17:08 ` Nandakumar Edamana
  2025-08-22 18:59   ` Eduard Zingerman
  2025-08-22 18:37 ` [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Eduard Zingerman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Nandakumar Edamana @ 2025-08-22 17:08 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan, Nandakumar Edamana

This commit adds selftest to test the abstract multiplication
technique(s) used by the verifier, following the recent improvement in
tnum multiplication (tnum_mul). One of the newly added programs,
verifier_mul/mul_precise, results in a false positive with the old
tnum_mul, while the program passes with the latest one.

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
---
 .../selftests/bpf/prog_tests/verifier.c       |  2 +
 .../selftests/bpf/progs/verifier_mul.c        | 75 +++++++++++++++++++
 2 files changed, 77 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_mul.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 77ec95d4ffaa..e35c216dbaf2 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -59,6 +59,7 @@
 #include "verifier_meta_access.skel.h"
 #include "verifier_movsx.skel.h"
 #include "verifier_mtu.skel.h"
+#include "verifier_mul.skel.h"
 #include "verifier_netfilter_ctx.skel.h"
 #include "verifier_netfilter_retcode.skel.h"
 #include "verifier_bpf_fastcall.skel.h"
@@ -194,6 +195,7 @@ void test_verifier_may_goto_1(void)           { RUN(verifier_may_goto_1); }
 void test_verifier_may_goto_2(void)           { RUN(verifier_may_goto_2); }
 void test_verifier_meta_access(void)          { RUN(verifier_meta_access); }
 void test_verifier_movsx(void)                 { RUN(verifier_movsx); }
+void test_verifier_mul(void)                  { RUN(verifier_mul); }
 void test_verifier_netfilter_ctx(void)        { RUN(verifier_netfilter_ctx); }
 void test_verifier_netfilter_retcode(void)    { RUN(verifier_netfilter_retcode); }
 void test_verifier_bpf_fastcall(void)         { RUN(verifier_bpf_fastcall); }
diff --git a/tools/testing/selftests/bpf/progs/verifier_mul.c b/tools/testing/selftests/bpf/progs/verifier_mul.c
new file mode 100644
index 000000000000..e7ccf19c7461
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_mul.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Nandakumar Edamana */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+
+/* The programs here are meant to test the abstract multiplication
+ * technique(s) used by the verifier. Using assembly to prevent
+ * compiler optimizations.
+ */
+
+SEC("fentry/bpf_fentry_test1")
+void BPF_PROG(mul_0, int x)
+{
+	asm volatile ("\
+	call %[bpf_get_prandom_u32];\
+	r0 *= 0;\
+	if r0 != 0 goto l0_%=;\
+	r0 = 0;\
+	goto l1_%=;\
+l0_%=:\
+	r0 = 1;\
+l1_%=:\
+"	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
+
+SEC("fentry/bpf_fentry_test1")
+__failure __msg("At program exit the register R0 has smin=1 smax=1 should have been in [0, 0]")
+void BPF_PROG(mul_uncertain, int x)
+{
+	asm volatile ("\
+	call %[bpf_get_prandom_u32];\
+	r0 *= 0x3;\
+	if r0 != 0 goto l0_%=;\
+	r0 = 0;\
+	goto l1_%=;\
+l0_%=:\
+	r0 = 1;\
+l1_%=:\
+"	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
+
+SEC("fentry/bpf_fentry_test1")
+void BPF_PROG(mul_precise, int x)
+{
+	/* First, force the verifier to be uncertain about the value:
+	 *     unsigned int a = (bpf_get_prandom_u32() & 0x2) | 0x1;
+	 *
+	 * Assuming the verifier is using tnum, a must be tnum{.v=0x1, .m=0x2}.
+	 * Then a * 0x3 would be m0m1 (m for uncertain). Added imprecision
+	 * would cause the following to fail, because the required return value
+	 * is 0:
+	 *     return (a * 0x3) & 0x4);
+	 */
+	asm volatile ("\
+	call %[bpf_get_prandom_u32];\
+	r0 &= 0x2;\
+	r0 |= 0x1;\
+	r0 *= 0x3;\
+	r0 &= 0x4;\
+	if r0 != 0 goto l0_%=;\
+	r0 = 0;\
+	goto l1_%=;\
+l0_%=:\
+	r0 = 1;\
+l1_%=:\
+"	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 17:08 [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Nandakumar Edamana
  2025-08-22 17:08 ` [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication Nandakumar Edamana
@ 2025-08-22 18:37 ` Eduard Zingerman
  2025-08-22 18:58   ` Nandakumar Edamana
  2025-08-22 21:14 ` Harishankar Vishwanathan
  2025-08-22 21:50 ` Eduard Zingerman
  3 siblings, 1 reply; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-22 18:37 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Fri, 2025-08-22 at 22:38 +0530, Nandakumar Edamana wrote:
> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398
> 
> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).
> 
> This commit also removes the value-mask decomposition technique
> employed by Harishankar et al., as its direct incorporation did not
> result in any improvements for the new algorithm.
> 
> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>


[...]

> diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> index fa353c5d550f..50e4d34d4774 100644
> --- a/kernel/bpf/tnum.c
> +++ b/kernel/bpf/tnum.c
> @@ -116,31 +116,39 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
>  	return TNUM(v & ~mu, mu);
>  }
>  
> -/* Generate partial products by multiplying each bit in the multiplier (tnum a)
> - * with the multiplicand (tnum b), and add the partial products after
> - * appropriately bit-shifting them. Instead of directly performing tnum addition
> - * on the generated partial products, equivalenty, decompose each partial
> - * product into two tnums, consisting of the value-sum (acc_v) and the
> - * mask-sum (acc_m) and then perform tnum addition on them. The following paper
> - * explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.
> +/* Perform long multiplication, iterating through the trits in a.
> + * Inside `else if (a.mask & 1)`, instead of simply multiplying b with LSB(a)'s
> + * uncertainty and accumulating directly, we find two possible partial products
> + * (one for LSB(a) = 0 and another for LSB(a) = 1), and add their union to the
> + * accumulator. This addresses an issue pointed out in an open question ("How
> + * can we incorporate correlation in unknown bits across partial products?")
> + * left by Harishankar et al. (https://arxiv.org/abs/2105.05398), improving
> + * the general precision significantly.
>   */

Nit: The above comment is good for commit message but not for the code itself.
     Imo, commit in the code should concentrate on the concrete mechanics.
     E.g. you had a nice example in the readme for improved-tnum-mul.
     So, maybe change to something like below?

     /*
      * Perform long multiplication, iterating through the trits in a:
      * - if LSB(a) is a known 0, keep current accumulator
      * - if LSB(a) is a known 1, add b to current accumulator
      * - if LSB(a) is unknown, take a union of the above cases.
      *
      * For example:
      *
      *               acc_0:	    acc_1:	
      *               		   	
      *     11 *  ->      11 *	->      11 *  -> union(0011, 1001) == x0x1
      *     x1	          01	        11	
      * ------	      ------	    ------	
      *     11	          11	        11	
      *    xx	         00	       11	
      * ------	      ------	    ------	
      *   ????	        0011	      1001
      */

>  struct tnum tnum_mul(struct tnum a, struct tnum b)
>  {
> -	u64 acc_v = a.value * b.value;
> -	struct tnum acc_m = TNUM(0, 0);
> +	struct tnum acc = TNUM(0, 0);
>  
>  	while (a.value || a.mask) {
>  		/* LSB of tnum a is a certain 1 */
>  		if (a.value & 1)
> -			acc_m = tnum_add(acc_m, TNUM(0, b.mask));
> +			acc = tnum_add(acc, b);
>  		/* LSB of tnum a is uncertain */
> -		else if (a.mask & 1)
> -			acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
> +		else if (a.mask & 1) {
> +			/* acc = tnum_union(acc_0, acc_1), where acc_0 and
> +			 * acc_1 are partial accumulators for cases
> +			 * LSB(a) = certain 0 and LSB(a) = certain 1.
> +			 * acc_0 = acc + 0 * b = acc.
> +			 * acc_1 = acc + 1 * b = tnum_add(acc, b).
> +			 */
> +
> +			acc = tnum_union(acc, tnum_add(acc, b));
> +		}
>  		/* Note: no case for LSB is certain 0 */
>  		a = tnum_rshift(a, 1);
>  		b = tnum_lshift(b, 1);
>  	}
> -	return tnum_add(TNUM(acc_v, 0), acc_m);
> +	return acc;
>  }
>  
>  /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 18:37 ` [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Eduard Zingerman
@ 2025-08-22 18:58   ` Nandakumar Edamana
  0 siblings, 0 replies; 15+ messages in thread
From: Nandakumar Edamana @ 2025-08-22 18:58 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On 23/08/25 00:07, Eduard Zingerman wrote:
> Nit: The above comment is good for commit message but not for the code itself.
>       Imo, commit in the code should concentrate on the concrete mechanics.
>       E.g. you had a nice example in the readme for improved-tnum-mul.
>       So, maybe change to something like below?
>
>       /*
>        * Perform long multiplication, iterating through the trits in a:
>        * - if LSB(a) is a known 0, keep current accumulator
>        * - if LSB(a) is a known 1, add b to current accumulator
>        * - if LSB(a) is unknown, take a union of the above cases.
>        *
>        * For example:
>        *
>        *               acc_0:	    acc_1:	
>        *               		   	
>        *     11 *  ->      11 *	->      11 *  -> union(0011, 1001) == x0x1
>        *     x1	          01	        11	
>        * ------	      ------	    ------	
>        *     11	          11	        11	
>        *    xx	         00	       11	
>        * ------	      ------	    ------	
>        *   ????	        0011	      1001
>        */
I really like this. Feel free to make this change, or just let me know 
if I should do it.

-- 
Nandakumar


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication
  2025-08-22 17:08 ` [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication Nandakumar Edamana
@ 2025-08-22 18:59   ` Eduard Zingerman
  0 siblings, 0 replies; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-22 18:59 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Fri, 2025-08-22 at 22:38 +0530, Nandakumar Edamana wrote:
> This commit adds selftest to test the abstract multiplication
> technique(s) used by the verifier, following the recent improvement in
> tnum multiplication (tnum_mul). One of the newly added programs,
> verifier_mul/mul_precise, results in a false positive with the old
> tnum_mul, while the program passes with the latest one.
> 
> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
> ---

Thank you for adding the test cases.

[...]

Given that other two cases pass both with old and new algo and don't
look particularly interesting I'd keep only this one.

> +SEC("fentry/bpf_fentry_test1")
> +void BPF_PROG(mul_precise, int x)
> +{
> +	/* First, force the verifier to be uncertain about the value:
> +	 *     unsigned int a = (bpf_get_prandom_u32() & 0x2) | 0x1;
> +	 *
> +	 * Assuming the verifier is using tnum, a must be tnum{.v=0x1, .m=0x2}.
> +	 * Then a * 0x3 would be m0m1 (m for uncertain). Added imprecision
> +	 * would cause the following to fail, because the required return value
> +	 * is 0:
> +	 *     return (a * 0x3) & 0x4);
> +	 */
> +	asm volatile ("\
> +	call %[bpf_get_prandom_u32];\
> +	r0 &= 0x2;\
> +	r0 |= 0x1;\
> +	r0 *= 0x3;\
> +	r0 &= 0x4;\
> +	if r0 != 0 goto l0_%=;\
> +	r0 = 0;\
> +	goto l1_%=;\
> +l0_%=:\
> +	r0 = 1;\
> +l1_%=:\
> +"	:
> +	: __imm(bpf_get_prandom_u32)
> +	: __clobber_all);
> +}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 17:08 [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Nandakumar Edamana
  2025-08-22 17:08 ` [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication Nandakumar Edamana
  2025-08-22 18:37 ` [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Eduard Zingerman
@ 2025-08-22 21:14 ` Harishankar Vishwanathan
  2025-08-22 21:50 ` Eduard Zingerman
  3 siblings, 0 replies; 15+ messages in thread
From: Harishankar Vishwanathan @ 2025-08-22 21:14 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: Eduard Zingerman, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Sitnicki

On Fri, Aug 22, 2025 at 1:08 PM Nandakumar Edamana
<nandakumar@nandakumar.co.in> wrote:
>
> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398
>
> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).
>
> This commit also removes the value-mask decomposition technique
> employed by Harishankar et al., as its direct incorporation did not
> result in any improvements for the new algorithm.
>
> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
> ---
>  include/linux/tnum.h |  3 +++
>  kernel/bpf/tnum.c    | 47 ++++++++++++++++++++++++++++++++------------
>  2 files changed, 37 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/tnum.h b/include/linux/tnum.h
> index 57ed3035cc30..68e9cdd0a2ab 100644
> --- a/include/linux/tnum.h
> +++ b/include/linux/tnum.h
> @@ -54,6 +54,9 @@ struct tnum tnum_mul(struct tnum a, struct tnum b);
>  /* Return a tnum representing numbers satisfying both @a and @b */
>  struct tnum tnum_intersect(struct tnum a, struct tnum b);
>
> +/* Returns a tnum representing numbers satisfying either @a or @b */
> +struct tnum tnum_union(struct tnum t1, struct tnum t2);
> +
>  /* Return @a with all but the lowest @size bytes cleared */
>  struct tnum tnum_cast(struct tnum a, u8 size);
>
> diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> index fa353c5d550f..50e4d34d4774 100644
> --- a/kernel/bpf/tnum.c
> +++ b/kernel/bpf/tnum.c
> @@ -116,31 +116,39 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
>         return TNUM(v & ~mu, mu);
>  }
>
> -/* Generate partial products by multiplying each bit in the multiplier (tnum a)
> - * with the multiplicand (tnum b), and add the partial products after
> - * appropriately bit-shifting them. Instead of directly performing tnum addition
> - * on the generated partial products, equivalenty, decompose each partial
> - * product into two tnums, consisting of the value-sum (acc_v) and the
> - * mask-sum (acc_m) and then perform tnum addition on them. The following paper
> - * explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.
> +/* Perform long multiplication, iterating through the trits in a.
> + * Inside `else if (a.mask & 1)`, instead of simply multiplying b with LSB(a)'s
> + * uncertainty and accumulating directly, we find two possible partial products
> + * (one for LSB(a) = 0 and another for LSB(a) = 1), and add their union to the
> + * accumulator. This addresses an issue pointed out in an open question ("How
> + * can we incorporate correlation in unknown bits across partial products?")
> + * left by Harishankar et al. (https://arxiv.org/abs/2105.05398), improving
> + * the general precision significantly.
>   */
>  struct tnum tnum_mul(struct tnum a, struct tnum b)
>  {
> -       u64 acc_v = a.value * b.value;
> -       struct tnum acc_m = TNUM(0, 0);
> +       struct tnum acc = TNUM(0, 0);
>
>         while (a.value || a.mask) {
>                 /* LSB of tnum a is a certain 1 */
>                 if (a.value & 1)
> -                       acc_m = tnum_add(acc_m, TNUM(0, b.mask));
> +                       acc = tnum_add(acc, b);
>                 /* LSB of tnum a is uncertain */
> -               else if (a.mask & 1)
> -                       acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
> +               else if (a.mask & 1) {
> +                       /* acc = tnum_union(acc_0, acc_1), where acc_0 and
> +                        * acc_1 are partial accumulators for cases
> +                        * LSB(a) = certain 0 and LSB(a) = certain 1.
> +                        * acc_0 = acc + 0 * b = acc.
> +                        * acc_1 = acc + 1 * b = tnum_add(acc, b).
> +                        */
> +
> +                       acc = tnum_union(acc, tnum_add(acc, b));
> +               }
>                 /* Note: no case for LSB is certain 0 */
>                 a = tnum_rshift(a, 1);
>                 b = tnum_lshift(b, 1);
>         }
> -       return tnum_add(TNUM(acc_v, 0), acc_m);
> +       return acc;
>  }
>
>  /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
> @@ -155,6 +163,19 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
>         return TNUM(v & ~mu, mu);
>  }
>
> +/* Returns a tnum with the uncertainty from both a and b, and in addition, new
> + * uncertainty at any position that a and b disagree. This represents a
> + * superset of the union of the concrete sets of both a and b. Despite the
> + * overapproximation, it is optimal.
> + */
> +struct tnum tnum_union(struct tnum a, struct tnum b)
> +{
> +       u64 v = a.value & b.value;
> +       u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> +
> +       return TNUM(v & ~mu, mu);
> +}
> +
>  struct tnum tnum_cast(struct tnum a, u8 size)
>  {
>         a.value &= (1ULL << (size * 8)) - 1;
> --
> 2.39.5
>

I was able to confirm the soundness of the algorithm in z3 for 8-bits
by unrolling the
loop [1] (assuming my encoding of the algorithm in logic is correct).

python3 tnum.py --bitwidth 8 --op tnum_new_mul
Verifying correctness of [tnum_new_mul] for tnums of width [8] ...
 SUCCESS.

https://github.com/bpfverif/tnums-cgo22/blob/dafc49f929c1160d81c39808fac98c0f55e639f3/verification/tnum.py#L279

Having said that,
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 17:08 [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Nandakumar Edamana
                   ` (2 preceding siblings ...)
  2025-08-22 21:14 ` Harishankar Vishwanathan
@ 2025-08-22 21:50 ` Eduard Zingerman
  2025-08-22 23:48   ` Nandakumar Edamana
  2025-08-22 23:50   ` Harishankar Vishwanathan
  3 siblings, 2 replies; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-22 21:50 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Fri, 2025-08-22 at 22:38 +0530, Nandakumar Edamana wrote:
> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398
> 
> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).
> 
> This commit also removes the value-mask decomposition technique
> employed by Harishankar et al., as its direct incorporation did not
> result in any improvements for the new algorithm.
> 
> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
> ---

I was looking a bit into why the new algo loses to current algo in
some cases, and came up with the following explanation of why this
algo does not produce "perfect" answer.

To compute a most precise result one needs to consider all possible
sums that constitute a final answer, e.g. for 0bxx1 * 0b111 possible
sums are:

  111 + 1110 + 11100
  111 + 0000 + 11100
  111 + 1110 + 00000
  111 + 0000 + 00000

tnum_union of these sums is 00xx0xx1, while new algo produces
00xxxxx1. This happens because 'x' bits in intermediate results
"poison" it a bit by accumulating and reducing overall precision.

It is not practical to do such sum tree exploration, of course,
but I stumbled upon the following simple optimization:

  @@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b)
          return TNUM(v & ~mu, mu);
   }
   
  -struct tnum tnum_mul_new(struct tnum a, struct tnum b)
  +struct tnum __tnum_mul_new(struct tnum a, struct tnum b)
   {
          struct tnum acc = TNUM(0, 0);
   
  @@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b)
          return acc;
   }
   
  +struct tnum tnum_mul_new(struct tnum a, struct tnum b)
  +{
  +       struct tnum ab = __tnum_mul_new(a, b);
  +       struct tnum ba = __tnum_mul_new(b, a);
  +
  +       return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
  +}
  +

For the 8-bit case I get the following stats (using the same [1] as
before):

  Patch as-is                 Patch with above modification
  -----------                 -----------------------------
  Tnums  : 6560
  New win: 30086328    70 %   31282549    73 %
  Old win: 1463809      3 %   907850       2 %
  Same   : 11483463    27 %   10843201    25 %


Looks a bit ugly, though.
Wdyt?

[1] https://github.com/eddyz87/tnum_mul_compare/blob/master/README.md

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 21:50 ` Eduard Zingerman
@ 2025-08-22 23:48   ` Nandakumar Edamana
  2025-08-22 23:56     ` Eduard Zingerman
  2025-08-22 23:50   ` Harishankar Vishwanathan
  1 sibling, 1 reply; 15+ messages in thread
From: Nandakumar Edamana @ 2025-08-22 23:48 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On 23/08/25 03:20, Eduard Zingerman wrote:
> It is not practical to do such sum tree exploration, of course,
> but I stumbled upon the following simple optimization:
>
>    @@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b)
>            return TNUM(v & ~mu, mu);
>     }
>     
>    -struct tnum tnum_mul_new(struct tnum a, struct tnum b)
>    +struct tnum __tnum_mul_new(struct tnum a, struct tnum b)
>     {
>            struct tnum acc = TNUM(0, 0);
>     
>    @@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b)
>            return acc;
>     }
>     
>    +struct tnum tnum_mul_new(struct tnum a, struct tnum b)
>    +{
>    +       struct tnum ab = __tnum_mul_new(a, b);
>    +       struct tnum ba = __tnum_mul_new(b, a);
>    +
>    +       return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
>    +}
>    +
>
> For the 8-bit case I get the following stats (using the same [1] as
> before):
>
>    Patch as-is                 Patch with above modification
>    -----------                 -----------------------------
>    Tnums  : 6560
>    New win: 30086328    70 %   31282549    73 %
>    Old win: 1463809      3 %   907850       2 %
>    Same   : 11483463    27 %   10843201    25 %
IIUC, this is same as what my test program [1] does with the 
`--commutative` option, and yes, it improves the precision slightly. 
There is one more possibility: run both the new tnum_mul() and the old 
tnum_mul(), and then pick the best one (still doesn't become optimal, 
based on experiments).
> Looks a bit ugly, though.
> Wdyt?

Well, I was afraid of the same and that's why it wasn't included in the 
patch.

It is clear that picking the best like this doesn't make it unsound. If 
my understanding is correct, tnum_mul is not something that is called 
very often in usual cases. So it shouldn't affect performance either. 
Then it boils down to the beauty of it. I personally don't think 
`best(a*b, b*a)` is ugly. What about `best(oldprod, newprod)`, where 
oldprod and newprod are each found like this, using the old tnum_mul and 
the new tnum_mul respectively?

[1] https://github.com/nandedamana/improved-tnum-mul

-- 
Nandakumar


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 21:50 ` Eduard Zingerman
  2025-08-22 23:48   ` Nandakumar Edamana
@ 2025-08-22 23:50   ` Harishankar Vishwanathan
  1 sibling, 0 replies; 15+ messages in thread
From: Harishankar Vishwanathan @ 2025-08-22 23:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Nandakumar Edamana, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Sitnicki

On Fri, Aug 22, 2025 at 5:50 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-08-22 at 22:38 +0530, Nandakumar Edamana wrote:
[...]
>   +struct tnum tnum_mul_new(struct tnum a, struct tnum b)
>   +{
>   +       struct tnum ab = __tnum_mul_new(a, b);
>   +       struct tnum ba = __tnum_mul_new(b, a);
>   +
>   +       return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
>   +}
>   +

We had originally observed in our paper [1] that tnum_mul is not
commutative, and considered
taking the best of two possible results (a*b) versus (b*a). Indeed,
the new tnum_mul is also not
commutative so we can do this.

We decided against it eventually because this approach involves
running two loops,
each of which will run 64 times in the worst case.

[1] https://arxiv.org/pdf/2105.05398


> For the 8-bit case I get the following stats (using the same [1] as
> before):
>
>   Patch as-is                 Patch with above modification
>   -----------                 -----------------------------
>   Tnums  : 6560
>   New win: 30086328    70 %   31282549    73 %
>   Old win: 1463809      3 %   907850       2 %
>   Same   : 11483463    27 %   10843201    25 %
>
>
> Looks a bit ugly, though.
> Wdyt?
>
> [1] https://github.com/eddyz87/tnum_mul_compare/blob/master/README.md
>
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 23:48   ` Nandakumar Edamana
@ 2025-08-22 23:56     ` Eduard Zingerman
  2025-08-25  4:16       ` Nandakumar Edamana
  2025-08-25 15:51       ` Eduard Zingerman
  0 siblings, 2 replies; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-22 23:56 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Sat, 2025-08-23 at 05:18 +0530, Nandakumar Edamana wrote:

[...]

> I personally don't think `best(a*b, b*a)` is ugly. What about
> `best(oldprod, newprod)`, where oldprod and newprod are each found
> like this, using the old tnum_mul and the new tnum_mul respectively?

Hm, given that both are correct if we go for a hybrid approach we can
peek known bits from both.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 23:56     ` Eduard Zingerman
@ 2025-08-25  4:16       ` Nandakumar Edamana
  2025-08-25 16:24         ` Eduard Zingerman
  2025-08-25 15:51       ` Eduard Zingerman
  1 sibling, 1 reply; 15+ messages in thread
From: Nandakumar Edamana @ 2025-08-25  4:16 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

Status as of now:

DECIDED:

1. Replace the current outer comment for the new tnum_mul() with a
    cleaner explanation and the example from the README of the test
    program.

2. (Related to PATCH 2/2) Drop the trivial tests.

UNDECIDED:

Instead of just doing tnum_mul(a, b),

a) whether to do best(tnum_mul(a, b), tnum_mul(b, a))
b) whether to do best(best(tnum_mul(a, b), tnum_mul(b, a)),
                       best(tnum_mul_old(a, b), tnum_mul_old(b, a)))

-- 
Nandakumar


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-22 23:56     ` Eduard Zingerman
  2025-08-25  4:16       ` Nandakumar Edamana
@ 2025-08-25 15:51       ` Eduard Zingerman
  1 sibling, 0 replies; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-25 15:51 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Fri, 2025-08-22 at 16:56 -0700, Eduard Zingerman wrote:
> On Sat, 2025-08-23 at 05:18 +0530, Nandakumar Edamana wrote:
> 
> [...]
> 
> > I personally don't think `best(a*b, b*a)` is ugly. What about
> > `best(oldprod, newprod)`, where oldprod and newprod are each found
> > like this, using the old tnum_mul and the new tnum_mul respectively?
> 
> Hm, given that both are correct if we go for a hybrid approach we can
> peek known bits from both.

Thinking it over the weekend, I tend to agree with Harishankar.
Few percent improvement does not merit complications with best part
selection or maintaining two algorithms for multiplication.
I'd stick with the new algorithm as in the current patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-25  4:16       ` Nandakumar Edamana
@ 2025-08-25 16:24         ` Eduard Zingerman
  2025-08-25 16:56           ` Nandakumar Edamana
  0 siblings, 1 reply; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-25 16:24 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Mon, 2025-08-25 at 09:46 +0530, Nandakumar Edamana wrote:
> Status as of now:
> 
> DECIDED:
> 
> 1. Replace the current outer comment for the new tnum_mul() with a
>     cleaner explanation and the example from the README of the test
>     program.
> 
> 2. (Related to PATCH 2/2) Drop the trivial tests.
> 
> UNDECIDED:
> 
> Instead of just doing tnum_mul(a, b),
> 
> a) whether to do best(tnum_mul(a, b), tnum_mul(b, a))
> b) whether to do best(best(tnum_mul(a, b), tnum_mul(b, a)),
>                        best(tnum_mul_old(a, b), tnum_mul_old(b, a)))

I'd drop both undecided points.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-25 16:24         ` Eduard Zingerman
@ 2025-08-25 16:56           ` Nandakumar Edamana
  2025-08-25 16:57             ` Eduard Zingerman
  0 siblings, 1 reply; 15+ messages in thread
From: Nandakumar Edamana @ 2025-08-25 16:56 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On 25/08/25 21:54, Eduard Zingerman wrote:
> On Mon, 2025-08-25 at 09:46 +0530, Nandakumar Edamana wrote:
>> Status as of now:
>>
>> DECIDED:
>>
>> 1. Replace the current outer comment for the new tnum_mul() with a
>>      cleaner explanation and the example from the README of the test
>>      program.
>>
>> 2. (Related to PATCH 2/2) Drop the trivial tests.
>>
>> UNDECIDED:
>>
>> Instead of just doing tnum_mul(a, b),
>>
>> a) whether to do best(tnum_mul(a, b), tnum_mul(b, a))
>> b) whether to do best(best(tnum_mul(a, b), tnum_mul(b, a)),
>>                         best(tnum_mul_old(a, b), tnum_mul_old(b, a)))
> I'd drop both undecided points.
Shall I send v5 with the decided changes then?

-- 
Nandakumar


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul
  2025-08-25 16:56           ` Nandakumar Edamana
@ 2025-08-25 16:57             ` Eduard Zingerman
  0 siblings, 0 replies; 15+ messages in thread
From: Eduard Zingerman @ 2025-08-25 16:57 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Sitnicki, Harishankar Vishwanathan

On Mon, 2025-08-25 at 22:26 +0530, Nandakumar Edamana wrote:
> On 25/08/25 21:54, Eduard Zingerman wrote:
> > On Mon, 2025-08-25 at 09:46 +0530, Nandakumar Edamana wrote:
> > > Status as of now:
> > > 
> > > DECIDED:
> > > 
> > > 1. Replace the current outer comment for the new tnum_mul() with a
> > >      cleaner explanation and the example from the README of the test
> > >      program.
> > > 
> > > 2. (Related to PATCH 2/2) Drop the trivial tests.
> > > 
> > > UNDECIDED:
> > > 
> > > Instead of just doing tnum_mul(a, b),
> > > 
> > > a) whether to do best(tnum_mul(a, b), tnum_mul(b, a))
> > > b) whether to do best(best(tnum_mul(a, b), tnum_mul(b, a)),
> > >                         best(tnum_mul_old(a, b), tnum_mul_old(b, a)))
> > I'd drop both undecided points.
> Shall I send v5 with the decided changes then?

Yes, please do.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-08-25 16:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 17:08 [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Nandakumar Edamana
2025-08-22 17:08 ` [PATCH v4 bpf-next 2/2] bpf: add selftest to check the verifier's abstract multiplication Nandakumar Edamana
2025-08-22 18:59   ` Eduard Zingerman
2025-08-22 18:37 ` [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul Eduard Zingerman
2025-08-22 18:58   ` Nandakumar Edamana
2025-08-22 21:14 ` Harishankar Vishwanathan
2025-08-22 21:50 ` Eduard Zingerman
2025-08-22 23:48   ` Nandakumar Edamana
2025-08-22 23:56     ` Eduard Zingerman
2025-08-25  4:16       ` Nandakumar Edamana
2025-08-25 16:24         ` Eduard Zingerman
2025-08-25 16:56           ` Nandakumar Edamana
2025-08-25 16:57             ` Eduard Zingerman
2025-08-25 15:51       ` Eduard Zingerman
2025-08-22 23:50   ` Harishankar Vishwanathan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).