Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
@ 2018-06-28 19:02 Jiong Wang
  2018-06-28 21:05 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Jiong Wang @ 2018-06-28 19:02 UTC (permalink / raw)
  To: Song Liu
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, oss-drivers,
	Networking

On Tue, Jun 26, 2018 at 7:21 AM, Song Liu <liu.song.a23@gmail.com> wrote:
> On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> From: Jiong Wang <jiong.wang@netronome.com>

<snip>

>> +
>> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
>> +{
>> +       struct reciprocal_value_adv R;
>> +       u32 l, post_shift;
>> +       u64 mhigh, mlow;
>> +
>> +       l = fls(d - 1);
>> +       post_shift = l;
>> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
>> +        * MSB set. This case needs to be handled before calling
>> +        * "reciprocal_value_adv", please see the comment at
>> +        * include/linux/reciprocal_div.h.
>> +        */
>
> Shall we handle l == 32 case better? I guess the concern here is extra
> handling may
> slow down the fast path?

The implementation of "reciprocal_value_adv" hasn't considered l  ==
32 which will make the code more complex.

As described at the pseudo code about how to call
"reciprocal_value_adv" in include/linux/reciprocal_div.h, l == 32
means the MSB of dividend is set, so the result of unsigned
divisor/dividend could only be 0 or 1, so the divide result could be
easily get by a comparison then conditional move 0 or 1 to the result.

> If that's the case, we should at least add a WARNING on the slow path.

OK, I will add a pr_warn inside "reciprocal_value_adv" when l == 32 is
triggered.

Thanks,
Jiong

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-28 19:02 [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jiong Wang
@ 2018-06-28 21:05 ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2018-06-28 21:05 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Song Liu, Alexei Starovoitov, Daniel Borkmann, oss-drivers,
	Networking

On Thu, 28 Jun 2018 20:02:43 +0100, Jiong Wang wrote:
> > If that's the case, we should at least add a WARNING on the slow path.  
> 
> OK, I will add a pr_warn inside "reciprocal_value_adv" when l == 32 is
> triggered.

WARN() seems useful, given seeing l == 32 means the code calling this
function is buggy, and we want to see the back trace to figure out how
it happened.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps
@ 2018-06-25  3:54 Jakub Kicinski
  2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This set enables memcpy optimization when the source is a map pointer.
The rest adds multiplication and devide support with Jiong describes
as follows:

NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
step, therefore we need 2 steps for u16 and 4 steps for u32.

We also need one start instruction to initialize the sequence and one or
two instructions to fetch the result depending on either you need the high
halve of u32 multiplication.

For ALU64, if either operand is beyond u32's value range, we reject it. One
thing to note, if the source operand is BPF_K, then we need to check "imm"
field directly, and we'd reject it if it is negative.  Because for ALU64,
"imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
doesn't support. For ALU32, it is fine for "imm" be negative though,
because the result is 32-bits and here is no difference on the low halve
of result for signed/unsigned mul, so we will get correct result.

NFP doesn't have integer divide instruction, this patch set uses reciprocal
algorithm (the basic one, reciprocal_div) to emulate it.

For each u32 divide, we would need 11 instructions to finish the operation.

  7 (for multiplication) + 4 (various ALUs) = 11

Given NFP only supports multiplication no bigger than u32, we'd require
divisor and dividend no bigger than that as well.

Also eBPF doesn't support signed divide and has enforced this on C language
level by failing compilation. However LLVM assembler hasn't enforced this,
so it is possible for negative constant to leak in as a BPF_K operand
through assembly code, we reject such cases as well.

Meanwhile reciprocal_div.h only implemented the basic version of:

  "Division by Invariant Integers Using Multiplication"
                        - Torbjörn Granlund and Peter L. Montgomery

This patch set further implements the optimized version (Figure 4.2 in the
paper) inside existing reciprocal_div.h. When the divider is even and the
calculated reciprocal magic number doesn't fit u32, we could reduce the
required ALU instructions from 4 to 2 or 1 for some cases.

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations. It makes sense to use it for JIT divide
code generation (for example eBPF JIT backends) for which we are willing to
trade performance of JITed code with that of host.

Jiong Wang (7):
  nfp: bpf: allow source ptr type be map ptr in memcpy optimization
  lib: reciprocal_div: implement the improved algorithm on the paper
    mentioned
  nfp: bpf: rename umin/umax to umin_src/umax_src
  nfp: bpf: copy range info for all operands of all ALU operations
  nfp: bpf: support u16 and u32 multiplications
  nfp: bpf: support u32 divide using reciprocal_div.h
  nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h

 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 232 +++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  43 ++--
 .../net/ethernet/netronome/nfp/bpf/offload.c  |   6 +-
 .../net/ethernet/netronome/nfp/bpf/verifier.c |  95 ++++++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |  28 +++
 include/linux/reciprocal_div.h                |  65 +++++
 lib/reciprocal_div.c                          |  37 +++
 7 files changed, 467 insertions(+), 39 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
@ 2018-06-25  3:54 ` Jakub Kicinski
  2018-06-26  6:21   ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2018-06-25  3:54 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jiong Wang

From: Jiong Wang <jiong.wang@netronome.com>

The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when dividend has MSB
set which would require u128 divide on host and actually could be easily
handled before calling the new "reciprocal_value_adv".

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.

It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.

However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.

To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.

  struct reciprocal_value_adv rvalue;
  u8 pre_shift, exp;

  if (d >= (1u << 31)) {
    result = n >= d;
    return;
  }
  rvalue = reciprocal_value_adv(d, 32)
  exp = rvalue.exp;
  if (rvalue.is_wide_m && !(d & 1)) {
    pre_shift = fls(d & -d) - 1;
    rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
  } else {
    pre_shift = 0;
  }

  // code generation starts.
  if (imm == 1 << exp) {
    result = n >> exp;
  } else if (rvalue.is_wide_m) {
    // pre_shift must be zero when reached here.
    t = (n * rvalue.m) >> 32;
    result = n - t;
    result >>= 1;
    result += t;
    result >>= rvalue.sh - 1;
  } else {
    if (pre_shift)
      result = n >> pre_shift;
    result = ((u64)result * rvalue.m) >> 32;
    result >>= rvalue.sh;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/reciprocal_div.h | 65 ++++++++++++++++++++++++++++++++++
 lib/reciprocal_div.c           | 37 +++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/reciprocal_div.h b/include/linux/reciprocal_div.h
index e031e9f2f9d8..5a695e4697d3 100644
--- a/include/linux/reciprocal_div.h
+++ b/include/linux/reciprocal_div.h
@@ -25,6 +25,9 @@ struct reciprocal_value {
 	u8 sh1, sh2;
 };
 
+/* "reciprocal_value" and "reciprocal_divide" together implement the basic
+ * version of the algorithm described in Figure 4.1 of the paper.
+ */
 struct reciprocal_value reciprocal_value(u32 d);
 
 static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
@@ -33,4 +36,66 @@ static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
 	return (t + ((a - t) >> R.sh1)) >> R.sh2;
 }
 
+struct reciprocal_value_adv {
+	u32 m;
+	u8 sh, exp;
+	bool is_wide_m;
+};
+
+/* "reciprocal_value_adv" implements the advanced version of the algorithm
+ * described in Figure 4.2 of the paper except when dividend has MSB set which
+ * would require u128 divide on host and actually could be easily handled before
+ * calling "reciprocal_value_adv".
+ *
+ * The advanced version requires more complex calculation to get the reciprocal
+ * multiplier and other control variables, but then could reduce the required
+ * emulation operations.
+ *
+ * It makes no sense to use this advanced version for host divide emulation,
+ * those extra complexities for calculating multiplier etc could completely
+ * waive our saving on emulation operations.
+ *
+ * However, it makes sense to use it for JIT divide code generation for which
+ * we are willing to trade performance of JITed code with that of host. As shown
+ * by the following pseudo code, the required emulation operations could go down
+ * from 6 (the basic version) to 3 or 4.
+ *
+ * To use the result of "reciprocal_value_adv", suppose we want to calculate
+ * n/d:
+ *
+ *   struct reciprocal_value_adv rvalue;
+ *   u8 pre_shift, exp;
+ *
+ *   if (d >= (1u << 31)) {
+ *     result = n >= d;
+ *     return;
+ *   }
+ *   rvalue = reciprocal_value_adv(d, 32)
+ *   exp = rvalue.exp;
+ *   if (rvalue.is_wide_m && !(d & 1)) {
+ *     pre_shift = fls(d & -d) - 1;
+ *     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
+ *   } else {
+ *     pre_shift = 0;
+ *   }
+ *
+ *   // code generation starts.
+ *   if (imm == 1 << exp) {
+ *     result = n >> exp;
+ *   } else if (rvalue.is_wide_m) {
+ *     // pre_shift must be zero when reached here.
+ *     t = (n * rvalue.m) >> 32;
+ *     result = n - t;
+ *     result >>= 1;
+ *     result += t;
+ *     result >>= rvalue.sh - 1;
+ *   } else {
+ *     if (pre_shift)
+ *       result = n >> pre_shift;
+ *     result = ((u64)result * rvalue.m) >> 32;
+ *     result >>= rvalue.sh;
+ *   }
+ */
+struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec);
+
 #endif /* _LINUX_RECIPROCAL_DIV_H */
diff --git a/lib/reciprocal_div.c b/lib/reciprocal_div.c
index fcb4ce682c6f..a41501ebad7c 100644
--- a/lib/reciprocal_div.c
+++ b/lib/reciprocal_div.c
@@ -26,3 +26,40 @@ struct reciprocal_value reciprocal_value(u32 d)
 	return R;
 }
 EXPORT_SYMBOL(reciprocal_value);
+
+struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
+{
+	struct reciprocal_value_adv R;
+	u32 l, post_shift;
+	u64 mhigh, mlow;
+
+	l = fls(d - 1);
+	post_shift = l;
+	/* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
+	 * MSB set. This case needs to be handled before calling
+	 * "reciprocal_value_adv", please see the comment at
+	 * include/linux/reciprocal_div.h.
+	 */
+	mlow = 1ULL << (32 + l);
+	do_div(mlow, d);
+	mhigh = (1ULL << (32 + l)) + (1ULL << (32 + l - prec));
+	do_div(mhigh, d);
+
+	for (; post_shift > 0; post_shift--) {
+		u64 lo = mlow >> 1, hi = mhigh >> 1;
+
+		if (lo >= hi)
+			break;
+
+		mlow = lo;
+		mhigh = hi;
+	}
+
+	R.m = (u32)mhigh;
+	R.sh = post_shift;
+	R.exp = l;
+	R.is_wide_m = mhigh > U32_MAX;
+
+	return R;
+}
+EXPORT_SYMBOL(reciprocal_value_adv);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
@ 2018-06-26  6:21   ` Song Liu
  2018-06-26 20:52     ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Song Liu @ 2018-06-26  6:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking,
	Jiong Wang

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> The new added "reciprocal_value_adv" implements the advanced version of the
> algorithm described in Figure 4.2 of the paper except when dividend has MSB
> set which would require u128 divide on host and actually could be easily
> handled before calling the new "reciprocal_value_adv".
>
> The advanced version requires more complex calculation to get the
> reciprocal multiplier and other control variables, but then could reduce
> the required emulation operations.
>
> It makes no sense to use this advanced version for host divide emulation,
> those extra complexities for calculating multiplier etc could completely
> waive our saving on emulation operations.
>
> However, it makes sense to use it for JIT divide code generation (for
> example eBPF JIT backends) for which we are willing to trade performance of
> JITed code with that of host. As shown by the following pseudo code, the
> required emulation operations could go down from 6 (the basic version) to 3
> or 4.
>
> To use the result of "reciprocal_value_adv", suppose we want to calculate
> n/d, the C-style pseudo code will be the following, it could be easily
> changed to real code generation for other JIT targets.
>
>   struct reciprocal_value_adv rvalue;
>   u8 pre_shift, exp;
>
>   if (d >= (1u << 31)) {
>     result = n >= d;
>     return;
>   }
>   rvalue = reciprocal_value_adv(d, 32)
>   exp = rvalue.exp;
>   if (rvalue.is_wide_m && !(d & 1)) {
>     pre_shift = fls(d & -d) - 1;
>     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
>   } else {
>     pre_shift = 0;
>   }
>
>   // code generation starts.
>   if (imm == 1 << exp) {
>     result = n >> exp;
>   } else if (rvalue.is_wide_m) {
>     // pre_shift must be zero when reached here.
>     t = (n * rvalue.m) >> 32;
>     result = n - t;
>     result >>= 1;
>     result += t;
>     result >>= rvalue.sh - 1;
>   } else {
>     if (pre_shift)
>       result = n >> pre_shift;
>     result = ((u64)result * rvalue.m) >> 32;
>     result >>= rvalue.sh;
>   }
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  include/linux/reciprocal_div.h | 65 ++++++++++++++++++++++++++++++++++
>  lib/reciprocal_div.c           | 37 +++++++++++++++++++
>  2 files changed, 102 insertions(+)
>
> diff --git a/include/linux/reciprocal_div.h b/include/linux/reciprocal_div.h
> index e031e9f2f9d8..5a695e4697d3 100644
> --- a/include/linux/reciprocal_div.h
> +++ b/include/linux/reciprocal_div.h
> @@ -25,6 +25,9 @@ struct reciprocal_value {
>         u8 sh1, sh2;
>  };
>
> +/* "reciprocal_value" and "reciprocal_divide" together implement the basic
> + * version of the algorithm described in Figure 4.1 of the paper.
> + */
>  struct reciprocal_value reciprocal_value(u32 d);
>
>  static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
> @@ -33,4 +36,66 @@ static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
>         return (t + ((a - t) >> R.sh1)) >> R.sh2;
>  }
>
> +struct reciprocal_value_adv {
> +       u32 m;
> +       u8 sh, exp;
> +       bool is_wide_m;
> +};
> +
> +/* "reciprocal_value_adv" implements the advanced version of the algorithm
> + * described in Figure 4.2 of the paper except when dividend has MSB set which
> + * would require u128 divide on host and actually could be easily handled before
> + * calling "reciprocal_value_adv".
> + *
> + * The advanced version requires more complex calculation to get the reciprocal
> + * multiplier and other control variables, but then could reduce the required
> + * emulation operations.
> + *
> + * It makes no sense to use this advanced version for host divide emulation,
> + * those extra complexities for calculating multiplier etc could completely
> + * waive our saving on emulation operations.
> + *
> + * However, it makes sense to use it for JIT divide code generation for which
> + * we are willing to trade performance of JITed code with that of host. As shown
> + * by the following pseudo code, the required emulation operations could go down
> + * from 6 (the basic version) to 3 or 4.
> + *
> + * To use the result of "reciprocal_value_adv", suppose we want to calculate
> + * n/d:
> + *
> + *   struct reciprocal_value_adv rvalue;
> + *   u8 pre_shift, exp;
> + *
> + *   if (d >= (1u << 31)) {
> + *     result = n >= d;
> + *     return;
> + *   }
> + *   rvalue = reciprocal_value_adv(d, 32)
> + *   exp = rvalue.exp;
> + *   if (rvalue.is_wide_m && !(d & 1)) {
> + *     pre_shift = fls(d & -d) - 1;
> + *     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
> + *   } else {
> + *     pre_shift = 0;
> + *   }
> + *
> + *   // code generation starts.
> + *   if (imm == 1 << exp) {
> + *     result = n >> exp;
> + *   } else if (rvalue.is_wide_m) {
> + *     // pre_shift must be zero when reached here.
> + *     t = (n * rvalue.m) >> 32;
> + *     result = n - t;
> + *     result >>= 1;
> + *     result += t;
> + *     result >>= rvalue.sh - 1;
> + *   } else {
> + *     if (pre_shift)
> + *       result = n >> pre_shift;
> + *     result = ((u64)result * rvalue.m) >> 32;
> + *     result >>= rvalue.sh;
> + *   }
> + */
> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec);
> +
>  #endif /* _LINUX_RECIPROCAL_DIV_H */
> diff --git a/lib/reciprocal_div.c b/lib/reciprocal_div.c
> index fcb4ce682c6f..a41501ebad7c 100644
> --- a/lib/reciprocal_div.c
> +++ b/lib/reciprocal_div.c
> @@ -26,3 +26,40 @@ struct reciprocal_value reciprocal_value(u32 d)
>         return R;
>  }
>  EXPORT_SYMBOL(reciprocal_value);
> +
> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
> +{
> +       struct reciprocal_value_adv R;
> +       u32 l, post_shift;
> +       u64 mhigh, mlow;
> +
> +       l = fls(d - 1);
> +       post_shift = l;
> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
> +        * MSB set. This case needs to be handled before calling
> +        * "reciprocal_value_adv", please see the comment at
> +        * include/linux/reciprocal_div.h.
> +        */

Shall we handle l == 32 case better? I guess the concern here is extra
handling may
slow down the fast path? If that's the case, we should at least add a
WARNING on the
slow path.

Thanks,
Song


> +       mlow = 1ULL << (32 + l);
> +       do_div(mlow, d);
> +       mhigh = (1ULL << (32 + l)) + (1ULL << (32 + l - prec));
> +       do_div(mhigh, d);
> +
> +       for (; post_shift > 0; post_shift--) {
> +               u64 lo = mlow >> 1, hi = mhigh >> 1;
> +
> +               if (lo >= hi)
> +                       break;
> +
> +               mlow = lo;
> +               mhigh = hi;
> +       }
> +
> +       R.m = (u32)mhigh;
> +       R.sh = post_shift;
> +       R.exp = l;
> +       R.is_wide_m = mhigh > U32_MAX;
> +
> +       return R;
> +}
> +EXPORT_SYMBOL(reciprocal_value_adv);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-26  6:21   ` Song Liu
@ 2018-06-26 20:52     ` Jakub Kicinski
  2018-06-27  8:54       ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2018-06-26 20:52 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking,
	Jiong Wang

On Mon, 25 Jun 2018 23:21:10 -0700, Song Liu wrote:
> > +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
> > +{
> > +       struct reciprocal_value_adv R;
> > +       u32 l, post_shift;
> > +       u64 mhigh, mlow;
> > +
> > +       l = fls(d - 1);
> > +       post_shift = l;
> > +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
> > +        * MSB set. This case needs to be handled before calling
> > +        * "reciprocal_value_adv", please see the comment at
> > +        * include/linux/reciprocal_div.h.
> > +        */  
> 
> Shall we handle l == 32 case better? I guess the concern here is extra
> handling may slow down the fast path? If that's the case, we should
> at least add a WARNING on the slow path.

Agreed, I think Jiong is travelling, hence no response.  We'll respin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
  2018-06-26 20:52     ` Jakub Kicinski
@ 2018-06-27  8:54       ` Daniel Borkmann
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Borkmann @ 2018-06-27  8:54 UTC (permalink / raw)
  To: Jakub Kicinski, Song Liu
  Cc: Alexei Starovoitov, oss-drivers, Networking, Jiong Wang

On 06/26/2018 10:52 PM, Jakub Kicinski wrote:
> On Mon, 25 Jun 2018 23:21:10 -0700, Song Liu wrote:
>>> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
>>> +{
>>> +       struct reciprocal_value_adv R;
>>> +       u32 l, post_shift;
>>> +       u64 mhigh, mlow;
>>> +
>>> +       l = fls(d - 1);
>>> +       post_shift = l;
>>> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
>>> +        * MSB set. This case needs to be handled before calling
>>> +        * "reciprocal_value_adv", please see the comment at
>>> +        * include/linux/reciprocal_div.h.
>>> +        */  
>>
>> Shall we handle l == 32 case better? I guess the concern here is extra
>> handling may slow down the fast path? If that's the case, we should
>> at least add a WARNING on the slow path.
> 
> Agreed, I think Jiong is travelling, hence no response.  We'll respin.

Ok, since there's going to be a respin, I've tossed the current series from
patchwork in that case.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-28 21:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-28 19:02 [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jiong Wang
2018-06-28 21:05 ` Jakub Kicinski
  -- strict thread matches above, loose matches on Subject: below --
2018-06-25  3:54 [PATCH bpf-next 0/7] nfp: bpf: add multiplication, divide and memcpy from maps Jakub Kicinski
2018-06-25  3:54 ` [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
2018-06-26  6:21   ` Song Liu
2018-06-26 20:52     ` Jakub Kicinski
2018-06-27  8:54       ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox