bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
@ 2025-08-15 14:05 Nandakumar Edamana
  2025-08-15 19:10 ` Eduard Zingerman
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Nandakumar Edamana @ 2025-08-15 14:05 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Nandakumar Edamana

This commit addresses a challenge explained in an open question ("How
can we incorporate correlation in unknown bits across partial
products?") left by Harishankar et al. in their paper:
https://arxiv.org/abs/2105.05398

When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
from which we could find two possible partial products and take a
union. Experiment shows that applying this technique in long
multiplication improves the precision in a significant number of cases
(at the cost of losing precision in a relatively lower number of
cases).

This commit also removes the value-mask decomposition technique
employed by Harishankar et al., as its direct incorporation did not
result in any improvements for the new algorithm.

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
---
 include/linux/tnum.h |  3 +++
 kernel/bpf/tnum.c    | 42 +++++++++++++++++++++++++++++-------------
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/linux/tnum.h b/include/linux/tnum.h
index 57ed3035cc30..68e9cdd0a2ab 100644
--- a/include/linux/tnum.h
+++ b/include/linux/tnum.h
@@ -54,6 +54,9 @@ struct tnum tnum_mul(struct tnum a, struct tnum b);
 /* Return a tnum representing numbers satisfying both @a and @b */
 struct tnum tnum_intersect(struct tnum a, struct tnum b);
 
+/* Returns a tnum representing numbers satisfying either @a or @b */
+struct tnum tnum_union(struct tnum t1, struct tnum t2);
+
 /* Return @a with all but the lowest @size bytes cleared */
 struct tnum tnum_cast(struct tnum a, u8 size);
 
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
index fa353c5d550f..a9894711786a 100644
--- a/kernel/bpf/tnum.c
+++ b/kernel/bpf/tnum.c
@@ -116,31 +116,39 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
 	return TNUM(v & ~mu, mu);
 }
 
-/* Generate partial products by multiplying each bit in the multiplier (tnum a)
- * with the multiplicand (tnum b), and add the partial products after
- * appropriately bit-shifting them. Instead of directly performing tnum addition
- * on the generated partial products, equivalenty, decompose each partial
- * product into two tnums, consisting of the value-sum (acc_v) and the
- * mask-sum (acc_m) and then perform tnum addition on them. The following paper
- * explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.
+/* Perform long multiplication, iterating through the trits in a.
+ * Inside `else if (a.mask & 1)`, instead of simply multiplying b with LSB(a)'s
+ * uncertainty and accumulating directly, we find two possible partial products
+ * (one for LSB(a) = 0 and another for LSB(a) = 1), and add their union to the
+ * accumulator. This addresses an issue pointed out in an open question ("How
+ * can we incorporate correlation in unknown bits across partial products?")
+ * left by Harishankar et al. (https://arxiv.org/abs/2105.05398), improving
+ * the general precision significantly.
  */
 struct tnum tnum_mul(struct tnum a, struct tnum b)
 {
-	u64 acc_v = a.value * b.value;
-	struct tnum acc_m = TNUM(0, 0);
+	struct tnum acc = TNUM(0, 0);
 
 	while (a.value || a.mask) {
 		/* LSB of tnum a is a certain 1 */
 		if (a.value & 1)
-			acc_m = tnum_add(acc_m, TNUM(0, b.mask));
+			acc = tnum_add(acc, b);
 		/* LSB of tnum a is uncertain */
-		else if (a.mask & 1)
-			acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
+		else if (a.mask & 1) {
+			/* acc += tnum_union(acc_0, acc_1), where acc_0 and
+			 * acc_1 are partial accumulators for cases
+			 * LSB(a) = certain 0 and LSB(a) = certain 1.
+			 * acc_0 = acc + 0 * b = acc.
+			 * acc_1 = acc + 1 * b = tnum_add(acc, b).
+			 */
+
+			acc = tnum_union(acc, tnum_add(acc, b));
+		}
 		/* Note: no case for LSB is certain 0 */
 		a = tnum_rshift(a, 1);
 		b = tnum_lshift(b, 1);
 	}
-	return tnum_add(TNUM(acc_v, 0), acc_m);
+	return acc;
 }
 
 /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
@@ -155,6 +163,14 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
 	return TNUM(v & ~mu, mu);
 }
 
+struct tnum tnum_union(struct tnum a, struct tnum b)
+{
+	u64 v = a.value & b.value;
+	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
+
+	return TNUM(v & ~mu, mu);
+}
+
 struct tnum tnum_cast(struct tnum a, u8 size)
 {
 	a.value &= (1ULL << (size * 8)) - 1;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-15 14:05 [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul Nandakumar Edamana
@ 2025-08-15 19:10 ` Eduard Zingerman
  2025-08-16  4:58   ` Nandakumar Edamana
                     ` (2 more replies)
  2025-08-18 18:23 ` Jakub Sitnicki
  2025-08-18 23:14 ` Eduard Zingerman
  2 siblings, 3 replies; 16+ messages in thread
From: Eduard Zingerman @ 2025-08-15 19:10 UTC (permalink / raw)
  To: Nandakumar Edamana, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf

On Fri, 2025-08-15 at 19:35 +0530, Nandakumar Edamana wrote:
> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398
> 
> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).
> 
> This commit also removes the value-mask decomposition technique
> employed by Harishankar et al., as its direct incorporation did not
> result in any improvements for the new algorithm.
> 
> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
> ---

Hi Nandakumar,

Could you please provide a selftest demonstrating a difference in behavior?
What technique did you employ to estimate the number of cases when
precision is improved vs worsened? If this is some kind of a program
doing randomized testing, could you please share a link to it?

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-15 19:10 ` Eduard Zingerman
@ 2025-08-16  4:58   ` Nandakumar Edamana
  2025-08-18 22:16     ` Eduard Zingerman
  2025-08-18 22:47     ` Eduard Zingerman
  2025-08-18 14:24   ` Daniel Borkmann
  2025-08-20  6:15   ` Harishankar Vishwanathan
  2 siblings, 2 replies; 16+ messages in thread
From: Nandakumar Edamana @ 2025-08-16  4:58 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

On 16/08/25 00:40, Eduard Zingerman wrote:

 > Could you please provide a selftest demonstrating a difference in 
behavior?
 > What technique did you employ to estimate the number of cases when
 > precision is improved vs worsened? If this is some kind of a program
 > doing randomized testing, could you please share a link to it?

Hi Eduard,

Thanks for the quick response! I've made the test program available here:
https://github.com/nandedamana/improved-tnum-mul

The repo contains the code for the test program, and a README file with some
explanation and a sample output. The program basically does a brute-force
comparison involving all combinations possible with N bits (which of course
doesn't scale very well).

I've done some random checks at 64 bits (which I'd like to incorporate 
to the
program in future), and manual checks at lower bits. Still, I could've 
made some
stupid mistakes, and I'm really thankful that you are interested in
verifying it.

For the record, let me paste the sample output here:

$ ./experi --bits 6
...
mine vs kernel (bits = 6):
   better = 285059
   same   = 229058
   worse  = 17324
   myprod optimal cases  = 432406 / 531441
   linprod optimal cases = 202444 / 531441
----------------------------------------------------------
is optimal? (bits = 6): 0

The above output shows the new algorithm was an improvement over the 
current one
in 285k cases, and worse in 17k, while preserving the precision in 229k
cases. It achieved optimality in in 81% cases compared to the 38% 
obtained by
the current algorithm. I had benchmarked an early version of the new 
algorithm
at 8 bits as well, with similar (slightly better, actually) results.

Regarding adding selftests, I'll look into that as well. It'd be great 
if you
have any specific strategy in mind, wrt to this particular change. BTW, any
set of BPF programs that are known to fail due to the imprecision in the 
current
algorithm?

Thank you,

-- 
Nandakumar Edamana


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-15 19:10 ` Eduard Zingerman
  2025-08-16  4:58   ` Nandakumar Edamana
@ 2025-08-18 14:24   ` Daniel Borkmann
  2025-08-20  6:15   ` Harishankar Vishwanathan
  2 siblings, 0 replies; 16+ messages in thread
From: Daniel Borkmann @ 2025-08-18 14:24 UTC (permalink / raw)
  To: Eduard Zingerman, Nandakumar Edamana, Alexei Starovoitov,
	Andrii Nakryiko
  Cc: bpf, Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana,
	Santosh Nagarakatte

[ also adding Harishankar et al. to Cc, for the patch see :
   https://patchwork.kernel.org/project/netdevbpf/patch/20250815140510.1287598-1-nandakumar@nandakumar.co.in/
  ]

On 8/15/25 9:10 PM, Eduard Zingerman wrote:
> On Fri, 2025-08-15 at 19:35 +0530, Nandakumar Edamana wrote:
>> This commit addresses a challenge explained in an open question ("How
>> can we incorporate correlation in unknown bits across partial
>> products?") left by Harishankar et al. in their paper:
>> https://arxiv.org/abs/2105.05398
>>
>> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
>> from which we could find two possible partial products and take a
>> union. Experiment shows that applying this technique in long
>> multiplication improves the precision in a significant number of cases
>> (at the cost of losing precision in a relatively lower number of
>> cases).
>>
>> This commit also removes the value-mask decomposition technique
>> employed by Harishankar et al., as its direct incorporation did not
>> result in any improvements for the new algorithm.
>>
>> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
>> ---
> 
> Hi Nandakumar,
> 
> Could you please provide a selftest demonstrating a difference in behavior?
> What technique did you employ to estimate the number of cases when
> precision is improved vs worsened? If this is some kind of a program
> doing randomized testing, could you please share a link to it?
> 
> Thanks,
> Eduard


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-15 14:05 [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul Nandakumar Edamana
  2025-08-15 19:10 ` Eduard Zingerman
@ 2025-08-18 18:23 ` Jakub Sitnicki
  2025-08-18 22:49   ` Eduard Zingerman
  2025-08-18 23:14 ` Eduard Zingerman
  2 siblings, 1 reply; 16+ messages in thread
From: Jakub Sitnicki @ 2025-08-18 18:23 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf

On Fri, Aug 15, 2025 at 07:35 PM +0530, Nandakumar Edamana wrote:

[...]

> @@ -155,6 +163,14 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
>  	return TNUM(v & ~mu, mu);
>  }
>  
> +struct tnum tnum_union(struct tnum a, struct tnum b)
> +{
> +	u64 v = a.value & b.value;
> +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> +
> +	return TNUM(v & ~mu, mu);
> +}
> +

Not sure I follow. So if I have two tnums that represent known contants,
say a=(v=0b1010, m=0) and b=(v=0b0101, m=0), then their union is an
unknown u=(v=0b0000, m=0b1111)?

Full disclosure - I didn't read through the paper. The routine doesn't
seem to appear there, though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-16  4:58   ` Nandakumar Edamana
@ 2025-08-18 22:16     ` Eduard Zingerman
  2025-08-18 22:47     ` Eduard Zingerman
  1 sibling, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2025-08-18 22:16 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

On Sat, 2025-08-16 at 10:28 +0530, Nandakumar Edamana wrote:
> On 16/08/25 00:40, Eduard Zingerman wrote:
> 
>  > Could you please provide a selftest demonstrating a difference in 
> behavior?
>  > What technique did you employ to estimate the number of cases when
>  > precision is improved vs worsened? If this is some kind of a program
>  > doing randomized testing, could you please share a link to it?
> 
> Hi Eduard,
> 
> Thanks for the quick response! I've made the test program available here:
> https://github.com/nandedamana/improved-tnum-mul

Hi Nandakumar,

Thank you for the link and for the algorithm explanation in the readme.
What tool do you use for code generation (ngg)?

[...]

> $ ./experi --bits 6
> ...
> mine vs kernel (bits = 6):
>    better = 285059
>    same   = 229058
>    worse  = 17324
>    myprod optimal cases  = 432406 / 531441
>    linprod optimal cases = 202444 / 531441
> ----------------------------------------------------------
> is optimal? (bits = 6): 0

I did a more primitive jab at this in [1], comparing number of unknown
bits (`__builtin_popcountl(tnum.mask)`) for current vs proposed
tnum_mul() for all 8-bit tnums and got the following results:

  Tnums  : 6560
  New win: 30086328    70 %
  Old win: 1463809      3 %
  Same   : 11483463    27 %

[1] https://github.com/eddyz87/tnum_mul_compare/blob/master/README.md

[...]

> Regarding adding selftests, I'll look into that as well. It'd be great 
> if you
> have any specific strategy in mind, wrt to this particular change. BTW, any
> set of BPF programs that are known to fail due to the imprecision in the 
> current
> algorithm?

I don't know of any current examples failing because of lacking
multiplication precision. Given that your algorithm does not introduce
any new edge cases, I'd simply add a test case which is more precise
with the new algorithm. Just to have selftests that can detect the
change.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-16  4:58   ` Nandakumar Edamana
  2025-08-18 22:16     ` Eduard Zingerman
@ 2025-08-18 22:47     ` Eduard Zingerman
  1 sibling, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2025-08-18 22:47 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

On Sat, 2025-08-16 at 10:28 +0530, Nandakumar Edamana wrote:

[...]

> mine vs kernel (bits = 6):
>    better = 285059
>    same   = 229058
>    worse  = 17324

Did you found any patterns for cases where new algorithm fairs worse
compared to current?

[...]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-18 18:23 ` Jakub Sitnicki
@ 2025-08-18 22:49   ` Eduard Zingerman
  2025-08-19  9:20     ` Jakub Sitnicki
  0 siblings, 1 reply; 16+ messages in thread
From: Eduard Zingerman @ 2025-08-18 22:49 UTC (permalink / raw)
  To: Jakub Sitnicki, Nandakumar Edamana
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf

On Mon, 2025-08-18 at 20:23 +0200, Jakub Sitnicki wrote:
> On Fri, Aug 15, 2025 at 07:35 PM +0530, Nandakumar Edamana wrote:
> 
> [...]
> 
> > @@ -155,6 +163,14 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
> >  	return TNUM(v & ~mu, mu);
> >  }
> >  
> > +struct tnum tnum_union(struct tnum a, struct tnum b)
> > +{
> > +	u64 v = a.value & b.value;
> > +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> > +
> > +	return TNUM(v & ~mu, mu);
> > +}
> > +
> 
> Not sure I follow. So if I have two tnums that represent known contants,
> say a=(v=0b1010, m=0) and b=(v=0b0101, m=0), then their union is an
> unknown u=(v=0b0000, m=0b1111)?

Yes, because a and b have no bits in common.
As far as I understand, tnum_union() computes a tnum that is a
superset of both `a` and `b`. Maybe `union` is not the best name.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-15 14:05 [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul Nandakumar Edamana
  2025-08-15 19:10 ` Eduard Zingerman
  2025-08-18 18:23 ` Jakub Sitnicki
@ 2025-08-18 23:14 ` Eduard Zingerman
       [not found]   ` <3b7b2ed1-bce1-49da-b83c-2ca46850062c@nandakumar.co.in>
  2 siblings, 1 reply; 16+ messages in thread
From: Eduard Zingerman @ 2025-08-18 23:14 UTC (permalink / raw)
  To: Nandakumar Edamana, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf

On Fri, 2025-08-15 at 19:35 +0530, Nandakumar Edamana wrote:
> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398
> 
> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).
> 
> This commit also removes the value-mask decomposition technique
> employed by Harishankar et al., as its direct incorporation did not
> result in any improvements for the new algorithm.
> 
> Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
> ---

The algorithm makes sense to me, and seems easier to understand
compared to current one.

[...]

> @@ -155,6 +163,14 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
>  	return TNUM(v & ~mu, mu);
>  }
>  
> +struct tnum tnum_union(struct tnum a, struct tnum b)
> +{
> +	u64 v = a.value & b.value;
> +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
                 ^^^^^^^^^^^^^^^^^^^

Could you please add a comment here, saying something like:
"mask also includes any bits that are different between 'a' and 'b'"?

> +
> +	return TNUM(v & ~mu, mu);
> +}
> +
>  struct tnum tnum_cast(struct tnum a, u8 size)
>  {
>  	a.value &= (1ULL << (size * 8)) - 1;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
       [not found]   ` <3b7b2ed1-bce1-49da-b83c-2ca46850062c@nandakumar.co.in>
@ 2025-08-18 23:31     ` Eduard Zingerman
  0 siblings, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2025-08-18 23:31 UTC (permalink / raw)
  To: Nandakumar Edamana
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

On Tue, 2025-08-19 at 04:58 +0530, Nandakumar Edamana wrote:
>
> On 19/08/25 04:44, Eduard Zingerman wrote:
>
> >
> > [...]
> >
> >
> > >
> > > @@ -155,6 +163,14 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
> > >  	return TNUM(v & ~mu, mu);
> > >  }
> > >
> > > +struct tnum tnum_union(struct tnum a, struct tnum b)
> > > +{
> > > +	u64 v = a.value & b.value;
> > > +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> > >
> >
> >                  ^^^^^^^^^^^^^^^^^^^
> >
> > Could you please add a comment here, saying something like:
> > "mask also includes any bits that are different between 'a' and 'b'"?
> >
>  Thanks for the suggestion. Shall I send it as PATCH v3?

Yes, alongside a simple selftest, please.
Maybe wait a bit more for others to comment.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-18 22:49   ` Eduard Zingerman
@ 2025-08-19  9:20     ` Jakub Sitnicki
  2025-08-29  4:04       ` Shung-Hsi Yu
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Sitnicki @ 2025-08-19  9:20 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Nandakumar Edamana, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf

On Mon, Aug 18, 2025 at 03:49 PM -07, Eduard Zingerman wrote:
> On Mon, 2025-08-18 at 20:23 +0200, Jakub Sitnicki wrote:
>> On Fri, Aug 15, 2025 at 07:35 PM +0530, Nandakumar Edamana wrote:
>> 
>> [...]
>> 
>> > @@ -155,6 +163,14 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b)
>> >  	return TNUM(v & ~mu, mu);
>> >  }
>> >  
>> > +struct tnum tnum_union(struct tnum a, struct tnum b)
>> > +{
>> > +	u64 v = a.value & b.value;
>> > +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
>> > +
>> > +	return TNUM(v & ~mu, mu);
>> > +}
>> > +
>> 
>> Not sure I follow. So if I have two tnums that represent known contants,
>> say a=(v=0b1010, m=0) and b=(v=0b0101, m=0), then their union is an
>> unknown u=(v=0b0000, m=0b1111)?
>
> Yes, because a and b have no bits in common.
> As far as I understand, tnum_union() computes a tnum that is a
> superset of both `a` and `b`. Maybe `union` is not the best name.

Makes sense if I think about it like that. Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-15 19:10 ` Eduard Zingerman
  2025-08-16  4:58   ` Nandakumar Edamana
  2025-08-18 14:24   ` Daniel Borkmann
@ 2025-08-20  6:15   ` Harishankar Vishwanathan
       [not found]     ` <0ba41cd7-adc0-4c65-b1e0-defd8ebc2d64@nandakumar.co.in>
  2 siblings, 1 reply; 16+ messages in thread
From: Harishankar Vishwanathan @ 2025-08-20  6:15 UTC (permalink / raw)
  To: eddyz87; +Cc: andrii, ast, bpf, daniel, nandakumar

On Fri, 2025-08-15 at 19:35 +0530, Nandakumar Edamana wrote:

[...]

> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398

This is nice work on improving the precision of the tnum_mul algorithm
Nandakumar. I had a few questions and comments (below).

> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).

If I understand the idea correctly, when the multiplier's (i.e. tnum a) bit is
unknown, it can either be 0 or 1. If it is 0, then we add nothing to
accumulator, i.e. TNUM(0, 0). If it is 1, we can add b to the accumulator
(appropriately shifted). The main idea is to take the union of these two
possible partial products, and add that to the accumulator. If so, could we also
do the following?

acc = tnum_add(acc, tnum_union(TNUM(0, 0), b));

[...]

> +		else if (a.mask & 1) {
> +			/* acc += tnum_union(acc_0, acc_1), where acc_0 and
> +			 * acc_1 are partial accumulators for cases
> +			 * LSB(a) = certain 0 and LSB(a) = certain 1.
> +			 * acc_0 = acc + 0 * b = acc.
> +			 * acc_1 = acc + 1 * b = tnum_add(acc, b).
> +			 */
> +
> +			acc = tnum_union(acc, tnum_add(acc, b));

The comments in your algorithm seem different from what the code implements: I
believe the comment should read:
/* acc = tnum_union(acc_0, acc_1), where acc_0 and ... */

[...]

> +struct tnum tnum_union(struct tnum a, struct tnum b)
> +{
> +	u64 v = a.value & b.value;
> +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> +
> +	return TNUM(v & ~mu, mu);
> +}

The tnum_union() algorithm itself seems correct. I checked this in z3 [1].

[1] https://github.com/bpfverif/tnums-cgo22/blob/05a51a1af6cb72f9694e5034c263ca442b33b976/verification/tnum.py#L801

I also checked this algorithm for the precision gains myself [2]. While the
numbers were not exactly the same as the ones you shared, the algorithm does
seem to be more precise (echoed by Eduard's tests as well). In the below
"our_mul" is the existing algorithm, and "new_mul" is your new algorithm. There
are two outputs, one for the exhaustive enumeration at 8 bitwidth, and another
for 10M randomly sampled 64-bit tnums.

[2] https://github.com/bpfverif/tnums-cgo22/tree/main/precision-new-mul

$ ./precision 8 64 10000000
number of input tnum pairs where our_mul had better precision: 1420899
number of input tnum pairs where new_mul had better precision: 30063672
number of input tnum pairs where output was the same: 11383710
number of input tnum pairs where output was incomparable: 178440

sampled at bitwidth 64
number of input tnum pairs where our_mul had better precision: 66441
number of input tnum pairs where new_mul had better precision: 241588
number of input tnum pairs where output was the same: 9687187
number of input tnum pairs where output was incomparable: 4784

Finally, what is the guarantee that this algorithm is sound? It will be useful
to have a proof to ensure that we are maintaining correctness.

Best,
Harishankar Vishwanathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Fwd: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
       [not found]     ` <0ba41cd7-adc0-4c65-b1e0-defd8ebc2d64@nandakumar.co.in>
@ 2025-08-20  7:48       ` Nandakumar Edamana
  2025-08-21  5:46         ` Harishankar Vishwanathan
  2025-08-20 10:31       ` Nandakumar Edamana
  1 sibling, 1 reply; 16+ messages in thread
From: Nandakumar Edamana @ 2025-08-20  7:48 UTC (permalink / raw)
  To: Harishankar Vishwanathan; +Cc: andrii, ast, bpf, daniel, eddyz87

On 20/08/25 11:45, Harishankar Vishwanathan wrote:
> This is nice work on improving the precision of the tnum_mul algorithm
> Nandakumar. I had a few questions and comments (below).
Thanks a lot for going through this seriously.
> If I understand the idea correctly, when the multiplier's (i.e. tnum a) bit is
> unknown, it can either be 0 or 1. If it is 0, then we add nothing to
> accumulator, i.e. TNUM(0, 0). If it is 1, we can add b to the accumulator
> (appropriately shifted). The main idea is to take the union of these two
> possible partial products, and add that to the accumulator. If so, could we also
> do the following?
>
> acc = tnum_add(acc, tnum_union(TNUM(0, 0), b));

But tnum_union(TNUM(0, 0), b) would introduce a concrete 0 to the set, 
right?

> The comments in your algorithm seem different from what the code implements: I
> believe the comment should read:
> /* acc = tnum_union(acc_0, acc_1), where acc_0 and ... */
Yes, thanks a lot for pointing out! Will fix in v3.
> Finally, what is the guarantee that this algorithm is sound? It will be useful
> to have a proof to ensure that we are maintaining correctness.

Been working on it.

Thank you,

-- 
Nandakumar Edamana


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
       [not found]     ` <0ba41cd7-adc0-4c65-b1e0-defd8ebc2d64@nandakumar.co.in>
  2025-08-20  7:48       ` Fwd: " Nandakumar Edamana
@ 2025-08-20 10:31       ` Nandakumar Edamana
  1 sibling, 0 replies; 16+ messages in thread
From: Nandakumar Edamana @ 2025-08-20 10:31 UTC (permalink / raw)
  To: bpf

Sorry, my reply to Harishankar got detached from this thread due to an 
email misconfiguration from my part. Copying the archive link here for 
future reference: 
https://lore.kernel.org/bpf/c1296815-67f5-4f31-99fe-b9a86bb7a117@nandakumar.co.in/T/#u

-- 
Nandakumar Edamana


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-20  7:48       ` Fwd: " Nandakumar Edamana
@ 2025-08-21  5:46         ` Harishankar Vishwanathan
  0 siblings, 0 replies; 16+ messages in thread
From: Harishankar Vishwanathan @ 2025-08-21  5:46 UTC (permalink / raw)
  To: Nandakumar Edamana; +Cc: andrii, ast, bpf, daniel, eddyz87

On Wed, Aug 20, 2025 at 3:48 AM Nandakumar Edamana
<nandakumar@nandakumar.co.in> wrote:
>
> On 20/08/25 11:45, Harishankar Vishwanathan wrote:

[...]

> > If I understand the idea correctly, when the multiplier's (i.e. tnum a) bit is
> > unknown, it can either be 0 or 1. If it is 0, then we add nothing to
> > accumulator, i.e. TNUM(0, 0). If it is 1, we can add b to the accumulator
> > (appropriately shifted). The main idea is to take the union of these two
> > possible partial products, and add that to the accumulator. If so, could we also
> > do the following?
> >
> > acc = tnum_add(acc, tnum_union(TNUM(0, 0), b));
>
> But tnum_union(TNUM(0, 0), b) would introduce a concrete 0 to the set,
> right?

That makes sense, the above would be imprecise.

[...]

Best,
Harishankar Vishwanathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul
  2025-08-19  9:20     ` Jakub Sitnicki
@ 2025-08-29  4:04       ` Shung-Hsi Yu
  0 siblings, 0 replies; 16+ messages in thread
From: Shung-Hsi Yu @ 2025-08-29  4:04 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Jakub Sitnicki, Nandakumar Edamana, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, bpf

On Tue, Aug 19, 2025 at 11:20:04AM +0200, Jakub Sitnicki wrote:
> On Mon, Aug 18, 2025 at 03:49 PM -07, Eduard Zingerman wrote:
> > On Mon, 2025-08-18 at 20:23 +0200, Jakub Sitnicki wrote:
> >> On Fri, Aug 15, 2025 at 07:35 PM +0530, Nandakumar Edamana wrote:
[...]
> >> > +struct tnum tnum_union(struct tnum a, struct tnum b)
> >> > +{
> >> > +	u64 v = a.value & b.value;
> >> > +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> >> > +
> >> > +	return TNUM(v & ~mu, mu);
> >> > +}
> >> 
> >> Not sure I follow. So if I have two tnums that represent known contants,
> >> say a=(v=0b1010, m=0) and b=(v=0b0101, m=0), then their union is an
> >> unknown u=(v=0b0000, m=0b1111)?
> >
> > Yes, because a and b have no bits in common.
> > As far as I understand, tnum_union() computes a tnum that is a
> > superset of both `a` and `b`. Maybe `union` is not the best name.

Purely bike shedding, now that v6 is merged :)

I find `union` name well fit, because it mimics the union of two sets of
integers. For example, using Python-like syntax and assuming tnumify()
is a magical function that creates the best representation of tnum for
any set of integers.

  tnum(v=0b1010, m=0) = tnumify({ 10 })
  tnum(v=0b0101, m=0) = tnumify({  5 })

Whether you find union of { 5 } and { 10 } first, then come up with the
best representation of tnum

  tnumify(union({ 5 }, { 10 })) = tnumify({ 5, 10 })

Or converting { 5 } and { 10 } to tnum first, then use tnum_union() to
mimic union in the integer world

  tnum_union(tnumify({ 5 }), tnumify({ 10 }))

You will end up with the exact same thing

  tnum(v=0b0000, m=0b1111)

In other words the inability to represent { 5, 10 } is an inherent tnum
constrain, rather than of tnum_union's.

And while saying that the function computes a superset of both `a` and
`b` is correct, it is not as precise, because there could be many
supersets.

E.g. for a=tnum(v=0b0001, b=0) and b=tnum(v=0b0101, b=0) it gives
tnum(v=0b0001, m=0b0100). While that is indeed is a superset of `a`
and `b`, so is something like tnum(v=0b0001, m=0b1110).

> Makes sense if I think about it like that. Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-29  4:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 14:05 [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul Nandakumar Edamana
2025-08-15 19:10 ` Eduard Zingerman
2025-08-16  4:58   ` Nandakumar Edamana
2025-08-18 22:16     ` Eduard Zingerman
2025-08-18 22:47     ` Eduard Zingerman
2025-08-18 14:24   ` Daniel Borkmann
2025-08-20  6:15   ` Harishankar Vishwanathan
     [not found]     ` <0ba41cd7-adc0-4c65-b1e0-defd8ebc2d64@nandakumar.co.in>
2025-08-20  7:48       ` Fwd: " Nandakumar Edamana
2025-08-21  5:46         ` Harishankar Vishwanathan
2025-08-20 10:31       ` Nandakumar Edamana
2025-08-18 18:23 ` Jakub Sitnicki
2025-08-18 22:49   ` Eduard Zingerman
2025-08-19  9:20     ` Jakub Sitnicki
2025-08-29  4:04       ` Shung-Hsi Yu
2025-08-18 23:14 ` Eduard Zingerman
     [not found]   ` <3b7b2ed1-bce1-49da-b83c-2ca46850062c@nandakumar.co.in>
2025-08-18 23:31     ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).