* [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() @ 2025-05-18 13:38 David Laight 2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: David Laight @ 2025-05-18 13:38 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: David Laight, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das The pwm-stm32.c code wants a 'rounding up' version of mul_u64_u64_div_u64(). This can be done simply by adding 'divisor - 1' to the 128bit product. Implement mul_u64_add_u64_div_u64(a, b, c, d) = (a * b + c)/d based on the existing code. Define mul_u64_u64_div_u64(a, b, d) as mul_u64_add_u64_div_u64(a, b, 0, d) and mul_u64_u64_div_u64_roundup(a, b, d) as mul_u64_add_u64_div_u64(a, b, d-1, d). Only x86-64 has an optimsed (asm) version of the function. That is optimised to avoid the 'add c' when c is known to be zero. In all other cases the extra code will be noise compared to the software divide code. I've updated the test module to test mul_u64_u64_div_u64_roundup() and also enhanced it to verify the C division code on x86-64. Changes for v2: - Rename the 'divisor' parameter from 'c' to 'd'. - Add an extra patch to use BUG_ON() to trap zero divisors. - Remove the last patch that ran the C code on x86-64 (I've a plan to do that differently). Note that this code is slow, in userspace on a zen-5 220-250 clocks in 64bit mode and 450-900 clocks in 32bit mode. (Ignoring the fast path cases.) Not helped by gcc making a 'pigs breakfast' of mixed 32/64 bit maths (clang is a lot better). But helped by the x86 sh[rl]d and cmov (enabled for my 32bit builds). And I'm not at all sure the call in kernel/sched/cputime.c isn't in a relatively common path (rather than just hardware initialisation). I've a followup patch that reduces the clock counts to about 80 in 64bit mode and 130 in 32bit mode (pretty much data independant). David Laight (4): lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() lib: Add tests for mul_u64_u64_div_u64_roundup() arch/x86/include/asm/div64.h | 19 +++-- include/linux/math64.h | 45 ++++++++++- lib/math/div64.c | 43 ++++++----- lib/math/test_mul_u64_u64_div_u64.c | 116 +++++++++++++++++----------- 4 files changed, 149 insertions(+), 74 deletions(-) -- 2.39.5 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' 2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight @ 2025-05-18 13:38 ` David Laight 2025-05-20 2:11 ` Nicolas Pitre 2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-18 13:38 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: David Laight, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das Change to prototype from mul_u64_u64_div_u64(u64 a, u64 b, u64 c) to mul_u64_u64_div_u64(u64 a, u64 b, u64 d). Using 'd' for 'divisor' makes more sense. Am upcoming change adds a 'c' parameter to calculate (a * b + c)/d. Signed-off-by: David Laight <david.laight.linux@gmail.com> --- Note v1 of the patchset had mul_u64_add_u64_div_u64(a, b, add, c). lib/math/div64.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index 5faa29208bdb..a5c966a36836 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -184,10 +184,10 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) EXPORT_SYMBOL(iter_div_u64_rem); #ifndef mul_u64_u64_div_u64 -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) +u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) { if (ilog2(a) + ilog2(b) <= 62) - return div64_u64(a * b, c); + return div64_u64(a * b, d); #if defined(__SIZEOF_INT128__) @@ -212,36 +212,36 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) #endif - /* make sure c is not zero, trigger exception otherwise */ + /* make sure d is not zero, trigger exception otherwise */ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wdiv-by-zero" - if (unlikely(c == 0)) + if (unlikely(d == 0)) return 1/0; #pragma GCC diagnostic pop - int shift = __builtin_ctzll(c); + int shift = __builtin_ctzll(d); /* try reducing the fraction in case the dividend becomes <= 64 bits */ if ((n_hi >> shift) == 0) { u64 n = shift ? (n_lo >> shift) | (n_hi << (64 - shift)) : n_lo; - return div64_u64(n, c >> shift); + return div64_u64(n, d >> shift); /* * The remainder value if needed would be: - * res = div64_u64_rem(n, c >> shift, &rem); + * res = div64_u64_rem(n, d >> shift, &rem); * rem = (rem << shift) + (n_lo - (n << shift)); */ } - if (n_hi >= c) { + if (n_hi >= d) { /* overflow: result is unrepresentable in a u64 */ return -1; } /* Do the full 128 by 64 bits division */ - shift = __builtin_clzll(c); - c <<= shift; + shift = __builtin_clzll(d); + d <<= shift; int p = 64 + shift; u64 res = 0; @@ -256,8 +256,8 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) n_hi <<= shift; n_hi |= n_lo >> (64 - shift); n_lo <<= shift; - if (carry || (n_hi >= c)) { - n_hi -= c; + if (carry || (n_hi >= d)) { + n_hi -= d; res |= 1ULL << p; } } while (n_hi); -- 2.39.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' 2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight @ 2025-05-20 2:11 ` Nicolas Pitre 0 siblings, 0 replies; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 2:11 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Sun, 18 May 2025, David Laight wrote: > Change to prototype from mul_u64_u64_div_u64(u64 a, u64 b, u64 c) > to mul_u64_u64_div_u64(u64 a, u64 b, u64 d). > Using 'd' for 'divisor' makes more sense. > > Am upcoming change adds a 'c' parameter to calculate (a * b + c)/d. > > Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> > --- > > Note v1 of the patchset had mul_u64_add_u64_div_u64(a, b, add, c). > > lib/math/div64.c | 24 ++++++++++++------------ > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/lib/math/div64.c b/lib/math/div64.c > index 5faa29208bdb..a5c966a36836 100644 > --- a/lib/math/div64.c > +++ b/lib/math/div64.c > @@ -184,10 +184,10 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) > EXPORT_SYMBOL(iter_div_u64_rem); > > #ifndef mul_u64_u64_div_u64 > -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) > +u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > { > if (ilog2(a) + ilog2(b) <= 62) > - return div64_u64(a * b, c); > + return div64_u64(a * b, d); > > #if defined(__SIZEOF_INT128__) > > @@ -212,36 +212,36 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) > > #endif > > - /* make sure c is not zero, trigger exception otherwise */ > + /* make sure d is not zero, trigger exception otherwise */ > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wdiv-by-zero" > - if (unlikely(c == 0)) > + if (unlikely(d == 0)) > return 1/0; > #pragma GCC diagnostic pop > > - int shift = __builtin_ctzll(c); > + int shift = __builtin_ctzll(d); > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ > if ((n_hi >> shift) == 0) { > u64 n = shift ? (n_lo >> shift) | (n_hi << (64 - shift)) : n_lo; > > - return div64_u64(n, c >> shift); > + return div64_u64(n, d >> shift); > /* > * The remainder value if needed would be: > - * res = div64_u64_rem(n, c >> shift, &rem); > + * res = div64_u64_rem(n, d >> shift, &rem); > * rem = (rem << shift) + (n_lo - (n << shift)); > */ > } > > - if (n_hi >= c) { > + if (n_hi >= d) { > /* overflow: result is unrepresentable in a u64 */ > return -1; > } > > /* Do the full 128 by 64 bits division */ > > - shift = __builtin_clzll(c); > - c <<= shift; > + shift = __builtin_clzll(d); > + d <<= shift; > > int p = 64 + shift; > u64 res = 0; > @@ -256,8 +256,8 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) > n_hi <<= shift; > n_hi |= n_lo >> (64 - shift); > n_lo <<= shift; > - if (carry || (n_hi >= c)) { > - n_hi -= c; > + if (carry || (n_hi >= d)) { > + n_hi -= d; > res |= 1ULL << p; > } > } while (n_hi); > -- > 2.39.5 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight 2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight @ 2025-05-18 13:38 ` David Laight 2025-05-18 15:42 ` kernel test robot ` (2 more replies) 2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight 2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight 3 siblings, 3 replies; 21+ messages in thread From: David Laight @ 2025-05-18 13:38 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: David Laight, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined behaviour' the compiler generated for a compile-time 1/0 is in any way useful. It may be better to define the function to return ~(u64)0 for divide by zero. Signed-off-by: David Laight <david.laight.linux@gmail.com> --- A new change for v2 of the patchset. Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code continue and generate 'random' results for any 'undefined bahaviour'. lib/math/div64.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index a5c966a36836..c426fa0660bc 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -186,6 +186,9 @@ EXPORT_SYMBOL(iter_div_u64_rem); #ifndef mul_u64_u64_div_u64 u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) { + /* Trigger exception if divisor is zero */ + BUG_ON(!d); + if (ilog2(a) + ilog2(b) <= 62) return div64_u64(a * b, d); @@ -212,13 +215,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) #endif - /* make sure d is not zero, trigger exception otherwise */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wdiv-by-zero" - if (unlikely(d == 0)) - return 1/0; -#pragma GCC diagnostic pop - int shift = __builtin_ctzll(d); /* try reducing the fraction in case the dividend becomes <= 64 bits */ -- 2.39.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight @ 2025-05-18 15:42 ` kernel test robot 2025-05-18 21:50 ` David Laight 2025-05-19 6:10 ` Uwe Kleine-König 2025-05-20 2:21 ` Nicolas Pitre 2 siblings, 1 reply; 21+ messages in thread From: kernel test robot @ 2025-05-18 15:42 UTC (permalink / raw) To: David Laight, Andrew Morton, linux-kernel Cc: oe-kbuild-all, Linux Memory Management List, David Laight, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das Hi David, kernel test robot noticed the following build errors: [auto build test ERROR on next-20250516] url: https://github.com/intel-lab-lkp/linux/commits/David-Laight/lib-mul_u64_u64_div_u64-rename-parameter-c-to-d/20250518-214037 base: next-20250516 patch link: https://lore.kernel.org/r/20250518133848.5811-3-david.laight.linux%40gmail.com patch subject: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20250518/202505182351.bPFZE1vO-lkp@intel.com/config) compiler: or1k-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250518/202505182351.bPFZE1vO-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202505182351.bPFZE1vO-lkp@intel.com/ All errors (new ones prefixed by >>): lib/math/div64.c: In function 'mul_u64_u64_div_u64': >> lib/math/div64.c:190:9: error: implicit declaration of function 'BUG_ON' [-Wimplicit-function-declaration] 190 | BUG_ON(!d); | ^~~~~~ vim +/BUG_ON +190 lib/math/div64.c 185 186 #ifndef mul_u64_u64_div_u64 187 u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) 188 { 189 /* Trigger exception if divisor is zero */ > 190 BUG_ON(!d); 191 192 if (ilog2(a) + ilog2(b) <= 62) 193 return div64_u64(a * b, d); 194 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-18 15:42 ` kernel test robot @ 2025-05-18 21:50 ` David Laight 0 siblings, 0 replies; 21+ messages in thread From: David Laight @ 2025-05-18 21:50 UTC (permalink / raw) To: kernel test robot Cc: Andrew Morton, linux-kernel, oe-kbuild-all, Linux Memory Management List, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das On Sun, 18 May 2025 23:42:44 +0800 kernel test robot <lkp@intel.com> wrote: > Hi David, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on next-20250516] > > url: https://github.com/intel-lab-lkp/linux/commits/David-Laight/lib-mul_u64_u64_div_u64-rename-parameter-c-to-d/20250518-214037 > base: next-20250516 > patch link: https://lore.kernel.org/r/20250518133848.5811-3-david.laight.linux%40gmail.com > patch subject: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero > config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20250518/202505182351.bPFZE1vO-lkp@intel.com/config) > compiler: or1k-linux-gcc (GCC) 14.2.0 > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250518/202505182351.bPFZE1vO-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202505182351.bPFZE1vO-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > lib/math/div64.c: In function 'mul_u64_u64_div_u64': > >> lib/math/div64.c:190:9: error: implicit declaration of function 'BUG_ON' [-Wimplicit-function-declaration] > 190 | BUG_ON(!d); > | ^~~~~~ compiles fine on x86 :-( ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight 2025-05-18 15:42 ` kernel test robot @ 2025-05-19 6:10 ` Uwe Kleine-König 2025-05-19 11:59 ` David Laight 2025-05-20 2:21 ` Nicolas Pitre 2 siblings, 1 reply; 21+ messages in thread From: Uwe Kleine-König @ 2025-05-19 6:10 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das [-- Attachment #1: Type: text/plain, Size: 2016 bytes --] On Sun, May 18, 2025 at 02:38:46PM +0100, David Laight wrote: > Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined > behaviour' the compiler generated for a compile-time 1/0 is in any > way useful. > > It may be better to define the function to return ~(u64)0 for > divide by zero. > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > --- > > A new change for v2 of the patchset. > Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code > continue and generate 'random' results for any 'undefined bahaviour'. > > lib/math/div64.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/lib/math/div64.c b/lib/math/div64.c > index a5c966a36836..c426fa0660bc 100644 > --- a/lib/math/div64.c > +++ b/lib/math/div64.c > @@ -186,6 +186,9 @@ EXPORT_SYMBOL(iter_div_u64_rem); > #ifndef mul_u64_u64_div_u64 > u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > { > + /* Trigger exception if divisor is zero */ > + BUG_ON(!d); > + I'm unsure if I should like the BUG_ON better than return 1/0. My gut feeling is that mul_u64_u64_div_u64() should behave in the same way as e.g. div64_u64 (which is just `return dividend / divisor;` for 64bit archs and thus triggers the same exception as `return 1/0;`. If BUG_ON should be it, I'd prefer BUG_ON(unlikely(d == 0)); which keeps the unlikely() that is already in the check removed below and is more explicit that checking for !d. > if (ilog2(a) + ilog2(b) <= 62) > return div64_u64(a * b, d); > > @@ -212,13 +215,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > #endif > > - /* make sure d is not zero, trigger exception otherwise */ > -#pragma GCC diagnostic push > -#pragma GCC diagnostic ignored "-Wdiv-by-zero" > - if (unlikely(d == 0)) > - return 1/0; > -#pragma GCC diagnostic pop > - > int shift = __builtin_ctzll(d); > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ Best regards Uwe [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-19 6:10 ` Uwe Kleine-König @ 2025-05-19 11:59 ` David Laight 2025-05-20 1:54 ` Nicolas Pitre 0 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-19 11:59 UTC (permalink / raw) To: Uwe Kleine-König Cc: Andrew Morton, linux-kernel, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das On Mon, 19 May 2025 08:10:50 +0200 Uwe Kleine-König <u.kleine-koenig@baylibre.com> wrote: > On Sun, May 18, 2025 at 02:38:46PM +0100, David Laight wrote: > > Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined > > behaviour' the compiler generated for a compile-time 1/0 is in any > > way useful. > > > > It may be better to define the function to return ~(u64)0 for > > divide by zero. > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > > --- > > > > A new change for v2 of the patchset. > > Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code > > continue and generate 'random' results for any 'undefined bahaviour'. > > > > lib/math/div64.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c > > index a5c966a36836..c426fa0660bc 100644 > > --- a/lib/math/div64.c > > +++ b/lib/math/div64.c > > @@ -186,6 +186,9 @@ EXPORT_SYMBOL(iter_div_u64_rem); > > #ifndef mul_u64_u64_div_u64 > > u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > { > > + /* Trigger exception if divisor is zero */ > > + BUG_ON(!d); > > + > > I'm unsure if I should like the BUG_ON better than return 1/0. My gut > feeling is that mul_u64_u64_div_u64() should behave in the same way as > e.g. div64_u64 (which is just `return dividend / divisor;` for 64bit > archs and thus triggers the same exception as `return 1/0;`. You need to execute a run-time 1/0 not a compile-time one. clang is likely to decide it is 'undefined behaviour' and just not generate any code at all - including removing the 'if (!d)' condition. For x86 gcc does (sometimes at least) generate 'if (!d) asm("ud2")' but BUG_ON() adds a table entry for the fault site. > If BUG_ON should be it, I'd prefer > > BUG_ON(unlikely(d == 0)); > > which keeps the unlikely() that is already in the check removed below > and is more explicit that checking for !d. IIRC there is an 'unlikely' inside BUG_ON() - so the call site doesn't need one. David > > if (ilog2(a) + ilog2(b) <= 62) > > return div64_u64(a * b, d); > > > > @@ -212,13 +215,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > > > #endif > > > > - /* make sure d is not zero, trigger exception otherwise */ > > -#pragma GCC diagnostic push > > -#pragma GCC diagnostic ignored "-Wdiv-by-zero" > > - if (unlikely(d == 0)) > > - return 1/0; > > -#pragma GCC diagnostic pop > > - > > int shift = __builtin_ctzll(d); > > > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ > > Best regards > Uwe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-19 11:59 ` David Laight @ 2025-05-20 1:54 ` Nicolas Pitre 0 siblings, 0 replies; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 1:54 UTC (permalink / raw) To: David Laight Cc: Uwe Kleine-König, Andrew Morton, linux-kernel, Oleg Nesterov, Peter Zijlstra, Biju Das [-- Attachment #1: Type: text/plain, Size: 2222 bytes --] On Mon, 19 May 2025, David Laight wrote: > On Mon, 19 May 2025 08:10:50 +0200 > Uwe Kleine-König <u.kleine-koenig@baylibre.com> wrote: > > > On Sun, May 18, 2025 at 02:38:46PM +0100, David Laight wrote: > > > Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined > > > behaviour' the compiler generated for a compile-time 1/0 is in any > > > way useful. > > > > > > It may be better to define the function to return ~(u64)0 for > > > divide by zero. > > > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > > > --- > > > > > > A new change for v2 of the patchset. > > > Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code > > > continue and generate 'random' results for any 'undefined bahaviour'. > > > > > > lib/math/div64.c | 10 +++------- > > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c > > > index a5c966a36836..c426fa0660bc 100644 > > > --- a/lib/math/div64.c > > > +++ b/lib/math/div64.c > > > @@ -186,6 +186,9 @@ EXPORT_SYMBOL(iter_div_u64_rem); > > > #ifndef mul_u64_u64_div_u64 > > > u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > > { > > > + /* Trigger exception if divisor is zero */ > > > + BUG_ON(!d); > > > + > > > > I'm unsure if I should like the BUG_ON better than return 1/0. My gut > > feeling is that mul_u64_u64_div_u64() should behave in the same way as > > e.g. div64_u64 (which is just `return dividend / divisor;` for 64bit > > archs and thus triggers the same exception as `return 1/0;`. > > You need to execute a run-time 1/0 not a compile-time one. > clang is likely to decide it is 'undefined behaviour' and just not > generate any code at all - including removing the 'if (!d)' condition. The code as it is works perfectly with both gcc and clang: it triggers the proper trap or or call the corresponding exception handler. > For x86 gcc does (sometimes at least) generate 'if (!d) asm("ud2")' And that's perfect. Did you find any occurrence when it is not the case? > but BUG_ON() adds a table entry for the fault site. You should really be as close as the behavior you get with a runtime x/y where y = 0. When that happens there are no BUG_ON(). Nicolas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight 2025-05-18 15:42 ` kernel test robot 2025-05-19 6:10 ` Uwe Kleine-König @ 2025-05-20 2:21 ` Nicolas Pitre 2025-05-20 21:43 ` David Laight 2 siblings, 1 reply; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 2:21 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Sun, 18 May 2025, David Laight wrote: > Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined > behaviour' the compiler generated for a compile-time 1/0 is in any > way useful. > > It may be better to define the function to return ~(u64)0 for > divide by zero. > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > --- > > A new change for v2 of the patchset. > Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code > continue and generate 'random' results for any 'undefined bahaviour'. clang does exactly the same as gcc. As mentioned earlier, this is a soft NAK from me. The above explanation is more speculation than fact. And here we really want to duplicate the behavior a regular runtime 32-bit by 32-bit x/0 would produce, or in other words behave just like div64_u64() for that matter. > lib/math/div64.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/lib/math/div64.c b/lib/math/div64.c > index a5c966a36836..c426fa0660bc 100644 > --- a/lib/math/div64.c > +++ b/lib/math/div64.c > @@ -186,6 +186,9 @@ EXPORT_SYMBOL(iter_div_u64_rem); > #ifndef mul_u64_u64_div_u64 > u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > { > + /* Trigger exception if divisor is zero */ > + BUG_ON(!d); > + > if (ilog2(a) + ilog2(b) <= 62) > return div64_u64(a * b, d); > > @@ -212,13 +215,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > #endif > > - /* make sure d is not zero, trigger exception otherwise */ > -#pragma GCC diagnostic push > -#pragma GCC diagnostic ignored "-Wdiv-by-zero" > - if (unlikely(d == 0)) > - return 1/0; > -#pragma GCC diagnostic pop > - > int shift = __builtin_ctzll(d); > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ > -- > 2.39.5 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-20 2:21 ` Nicolas Pitre @ 2025-05-20 21:43 ` David Laight 2025-05-20 22:28 ` Nicolas Pitre 0 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-20 21:43 UTC (permalink / raw) To: Nicolas Pitre Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Mon, 19 May 2025 22:21:15 -0400 (EDT) Nicolas Pitre <npitre@baylibre.com> wrote: > On Sun, 18 May 2025, David Laight wrote: > > > Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined > > behaviour' the compiler generated for a compile-time 1/0 is in any > > way useful. > > > > It may be better to define the function to return ~(u64)0 for > > divide by zero. > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > > --- > > > > A new change for v2 of the patchset. > > Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code > > continue and generate 'random' results for any 'undefined bahaviour'. > > clang does exactly the same as gcc. Did you see the recent 'rant' from Linus about the way clang handles UB. I'm pretty sure 'divide by zero' is UB, valid options include. - Jump to random location in the current function (what Linus was ranting). - Jump to any random location with any register values. - Enable user access to all kernel memory. - Permanently damage your cpu. - Make your computer catch fire. - Send an ICBM to some unspecified location. If you want a 'divide by zero' error you better generate one. eg: int n = 0; OPTIMSER_HIDE_VAR(n); i = 1 / n; David > > As mentioned earlier, this is a soft NAK from me. > > The above explanation is more speculation than fact. And here we really > want to duplicate the behavior a regular runtime 32-bit by 32-bit x/0 > would produce, or in other words behave just like div64_u64() for that > matter. > > > lib/math/div64.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c > > index a5c966a36836..c426fa0660bc 100644 > > --- a/lib/math/div64.c > > +++ b/lib/math/div64.c > > @@ -186,6 +186,9 @@ EXPORT_SYMBOL(iter_div_u64_rem); > > #ifndef mul_u64_u64_div_u64 > > u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > { > > + /* Trigger exception if divisor is zero */ > > + BUG_ON(!d); > > + > > if (ilog2(a) + ilog2(b) <= 62) > > return div64_u64(a * b, d); > > > > @@ -212,13 +215,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > > > #endif > > > > - /* make sure d is not zero, trigger exception otherwise */ > > -#pragma GCC diagnostic push > > -#pragma GCC diagnostic ignored "-Wdiv-by-zero" > > - if (unlikely(d == 0)) > > - return 1/0; > > -#pragma GCC diagnostic pop > > - > > int shift = __builtin_ctzll(d); > > > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ > > -- > > 2.39.5 > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero 2025-05-20 21:43 ` David Laight @ 2025-05-20 22:28 ` Nicolas Pitre 0 siblings, 0 replies; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 22:28 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Tue, 20 May 2025, David Laight wrote: > On Mon, 19 May 2025 22:21:15 -0400 (EDT) > Nicolas Pitre <npitre@baylibre.com> wrote: > > > On Sun, 18 May 2025, David Laight wrote: > > > > > Do an explicit BUG_ON(!divisor) instead of hoping the 'undefined > > > behaviour' the compiler generated for a compile-time 1/0 is in any > > > way useful. > > > > > > It may be better to define the function to return ~(u64)0 for > > > divide by zero. > > > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > > > --- > > > > > > A new change for v2 of the patchset. > > > Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code > > > continue and generate 'random' results for any 'undefined bahaviour'. > > > > clang does exactly the same as gcc. > > Did you see the recent 'rant' from Linus about the way clang handles UB. > I'm pretty sure 'divide by zero' is UB, valid options include. > - Jump to random location in the current function (what Linus was ranting). > - Jump to any random location with any register values. > - Enable user access to all kernel memory. > - Permanently damage your cpu. > - Make your computer catch fire. > - Send an ICBM to some unspecified location. LOL > If you want a 'divide by zero' error you better generate one. eg: > int n = 0; > OPTIMSER_HIDE_VAR(n); > i = 1 / n; As you say then. As long as the resulting behavior is coherent for all division flavors on a given architecture. Nicolas ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight 2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight 2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight @ 2025-05-18 13:38 ` David Laight 2025-05-20 3:03 ` Nicolas Pitre 2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight 3 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-18 13:38 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: David Laight, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' variant needs 'divisor - 1' adding in between the multiply and divide so cannot easily be done by a caller. Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d and implement the 'round down' and 'round up' using it. Update the x86-64 asm to optimise for 'c' being a constant zero. For architectures that support u128 check for a 64bit product after the multiply (will be cheap). Leave in the early check for other architectures (mostly 32bit) when 'c' is zero to avoid the multi-part multiply. Note that the cost of the 128bit divide will dwarf the rest of the code. This function is very slow on everything except x86-64 (very very slow on 32bit). Add kerndoc definitions for all three functions. Signed-off-by: David Laight <david.laight.linux@gmail.com> --- Changes for v2 (formally patch 1/3): - Reinstate the early call to div64_u64() on 32bit when 'c' is zero. Although I'm not convinced the path is common enough to be worth the two ilog2() calls. arch/x86/include/asm/div64.h | 19 ++++++++++----- include/linux/math64.h | 45 +++++++++++++++++++++++++++++++++++- lib/math/div64.c | 21 ++++++++++------- 3 files changed, 70 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h index 9931e4c7d73f..7a0a916a2d7d 100644 --- a/arch/x86/include/asm/div64.h +++ b/arch/x86/include/asm/div64.h @@ -84,21 +84,28 @@ static inline u64 mul_u32_u32(u32 a, u32 b) * Will generate an #DE when the result doesn't fit u64, could fix with an * __ex_table[] entry when it becomes an issue. */ -static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) +static inline u64 mul_u64_add_u64_div_u64(u64 a, u64 mul, u64 add, u64 div) { u64 q; - asm ("mulq %2; divq %3" : "=a" (q) - : "a" (a), "rm" (mul), "rm" (div) - : "rdx"); + if (statically_true(!add)) { + asm ("mulq %2; divq %3" : "=a" (q) + : "a" (a), "rm" (mul), "rm" (div) + : "rdx"); + } else { + asm ("mulq %2; addq %3, %%rax; adcq $0, %%rdx; divq %4" + : "=a" (q) + : "a" (a), "rm" (mul), "rm" (add), "rm" (div) + : "rdx"); + } return q; } -#define mul_u64_u64_div_u64 mul_u64_u64_div_u64 +#define mul_u64_add_u64_div_u64 mul_u64_add_u64_div_u64 static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 div) { - return mul_u64_u64_div_u64(a, mul, div); + return mul_u64_add_u64_div_u64(a, mul, 0, div); } #define mul_u64_u32_div mul_u64_u32_div diff --git a/include/linux/math64.h b/include/linux/math64.h index 6aaccc1626ab..e1c2e3642cec 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -282,7 +282,50 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor) } #endif /* mul_u64_u32_div */ -u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div); +/** + * mul_u64_add_u64_div_u64 - unsigned 64bit multiply, add, and divide + * @a: first unsigned 64bit multiplicand + * @b: second unsigned 64bit multiplicand + * @c: unsigned 64bit addend + * @d: unsigned 64bit divisor + * + * Multiply two 64bit values together to generate a 128bit product + * add a third value and then divide by a fourth. + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. + * + * Return: (@a * @b + @c) / @d + */ +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); + +/** + * mul_u64_u64_div_u64 - unsigned 64bit multiply and divide + * @a: first unsigned 64bit multiplicand + * @b: second unsigned 64bit multiplicand + * @d: unsigned 64bit divisor + * + * Multiply two 64bit values together to generate a 128bit product + * and then divide by a third value. + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. + * + * Return: @a * @b / @d + */ +#define mul_u64_u64_div_u64(a, b, d) mul_u64_add_u64_div_u64(a, b, 0, d) + +/** + * mul_u64_u64_div_u64_roundup - unsigned 64bit multiply and divide rounded up + * @a: first unsigned 64bit multiplicand + * @b: second unsigned 64bit multiplicand + * @d: unsigned 64bit divisor + * + * Multiply two 64bit values together to generate a 128bit product + * and then divide and round up. + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. + * + * Return: (@a * @b + @d - 1) / @d + */ +#define mul_u64_u64_div_u64_roundup(a, b, d) \ + ({ u64 _tmp = (d); mul_u64_add_u64_div_u64(a, b, _tmp - 1, _tmp); }) + /** * DIV64_U64_ROUND_UP - unsigned 64bit divide with 64bit divisor rounded up diff --git a/lib/math/div64.c b/lib/math/div64.c index c426fa0660bc..66bfb6159f02 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -183,29 +183,31 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) } EXPORT_SYMBOL(iter_div_u64_rem); -#ifndef mul_u64_u64_div_u64 -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) +#ifndef mul_u64_add_u64_div_u64 +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) { /* Trigger exception if divisor is zero */ BUG_ON(!d); - if (ilog2(a) + ilog2(b) <= 62) - return div64_u64(a * b, d); - #if defined(__SIZEOF_INT128__) /* native 64x64=128 bits multiplication */ - u128 prod = (u128)a * b; + u128 prod = (u128)a * b + c; u64 n_lo = prod, n_hi = prod >> 64; #else + if (!c && ilog2(a) + ilog2(b) <= 62) + return div64_u64(a * b, d); + /* perform a 64x64=128 bits multiplication manually */ u32 a_lo = a, a_hi = a >> 32, b_lo = b, b_hi = b >> 32; u64 x, y, z; - x = (u64)a_lo * b_lo; + /* Since (x-1)(x-1) + 2(x-1) == x.x - 1 two u32 can be added to a u64 */ + x = (u64)a_lo * b_lo + (u32)c; y = (u64)a_lo * b_hi + (u32)(x >> 32); + y += (u32)(c >> 32); z = (u64)a_hi * b_hi + (u32)(y >> 32); y = (u64)a_hi * b_lo + (u32)y; z += (u32)(y >> 32); @@ -215,6 +217,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) #endif + if (!n_hi) + return div64_u64(n_lo, d); + int shift = __builtin_ctzll(d); /* try reducing the fraction in case the dividend becomes <= 64 bits */ @@ -261,5 +266,5 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) return res; } -EXPORT_SYMBOL(mul_u64_u64_div_u64); +EXPORT_SYMBOL(mul_u64_add_u64_div_u64); #endif -- 2.39.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight @ 2025-05-20 3:03 ` Nicolas Pitre 2025-05-20 21:37 ` David Laight 0 siblings, 1 reply; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 3:03 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Sun, 18 May 2025, David Laight wrote: > The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' > variant needs 'divisor - 1' adding in between the multiply and > divide so cannot easily be done by a caller. > > Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d > and implement the 'round down' and 'round up' using it. > > Update the x86-64 asm to optimise for 'c' being a constant zero. > > For architectures that support u128 check for a 64bit product after > the multiply (will be cheap). > Leave in the early check for other architectures (mostly 32bit) when > 'c' is zero to avoid the multi-part multiply. I agree with this, except for the "'c' is zero" part. More below. > Note that the cost of the 128bit divide will dwarf the rest of the code. > This function is very slow on everything except x86-64 (very very slow > on 32bit). > > Add kerndoc definitions for all three functions. > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > --- > Changes for v2 (formally patch 1/3): > - Reinstate the early call to div64_u64() on 32bit when 'c' is zero. > Although I'm not convinced the path is common enough to be worth > the two ilog2() calls. > > arch/x86/include/asm/div64.h | 19 ++++++++++----- > include/linux/math64.h | 45 +++++++++++++++++++++++++++++++++++- > lib/math/div64.c | 21 ++++++++++------- > 3 files changed, 70 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h > index 9931e4c7d73f..7a0a916a2d7d 100644 > --- a/arch/x86/include/asm/div64.h > +++ b/arch/x86/include/asm/div64.h > @@ -84,21 +84,28 @@ static inline u64 mul_u32_u32(u32 a, u32 b) > * Will generate an #DE when the result doesn't fit u64, could fix with an > * __ex_table[] entry when it becomes an issue. > */ > -static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) > +static inline u64 mul_u64_add_u64_div_u64(u64 a, u64 mul, u64 add, u64 div) > { > u64 q; > > - asm ("mulq %2; divq %3" : "=a" (q) > - : "a" (a), "rm" (mul), "rm" (div) > - : "rdx"); > + if (statically_true(!add)) { > + asm ("mulq %2; divq %3" : "=a" (q) > + : "a" (a), "rm" (mul), "rm" (div) > + : "rdx"); > + } else { > + asm ("mulq %2; addq %3, %%rax; adcq $0, %%rdx; divq %4" > + : "=a" (q) > + : "a" (a), "rm" (mul), "rm" (add), "rm" (div) > + : "rdx"); > + } > > return q; > } > -#define mul_u64_u64_div_u64 mul_u64_u64_div_u64 > +#define mul_u64_add_u64_div_u64 mul_u64_add_u64_div_u64 > > static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 div) > { > - return mul_u64_u64_div_u64(a, mul, div); > + return mul_u64_add_u64_div_u64(a, mul, 0, div); > } > #define mul_u64_u32_div mul_u64_u32_div > > diff --git a/include/linux/math64.h b/include/linux/math64.h > index 6aaccc1626ab..e1c2e3642cec 100644 > --- a/include/linux/math64.h > +++ b/include/linux/math64.h > @@ -282,7 +282,50 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor) > } > #endif /* mul_u64_u32_div */ > > -u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div); > +/** > + * mul_u64_add_u64_div_u64 - unsigned 64bit multiply, add, and divide > + * @a: first unsigned 64bit multiplicand > + * @b: second unsigned 64bit multiplicand > + * @c: unsigned 64bit addend > + * @d: unsigned 64bit divisor > + * > + * Multiply two 64bit values together to generate a 128bit product > + * add a third value and then divide by a fourth. > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > + * > + * Return: (@a * @b + @c) / @d > + */ > +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); > + > +/** > + * mul_u64_u64_div_u64 - unsigned 64bit multiply and divide > + * @a: first unsigned 64bit multiplicand > + * @b: second unsigned 64bit multiplicand > + * @d: unsigned 64bit divisor > + * > + * Multiply two 64bit values together to generate a 128bit product > + * and then divide by a third value. > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. If the quotient exceeds 64 bits, the optimized x86 version truncates the value to the low 64 bits. The C version returns a saturated value i.e. UINT64_MAX (implemented with a -1). Nothing actually traps in that case. > + * > + * Return: @a * @b / @d > + */ > +#define mul_u64_u64_div_u64(a, b, d) mul_u64_add_u64_div_u64(a, b, 0, d) > + > +/** > + * mul_u64_u64_div_u64_roundup - unsigned 64bit multiply and divide rounded up > + * @a: first unsigned 64bit multiplicand > + * @b: second unsigned 64bit multiplicand > + * @d: unsigned 64bit divisor > + * > + * Multiply two 64bit values together to generate a 128bit product > + * and then divide and round up. > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > + * > + * Return: (@a * @b + @d - 1) / @d > + */ > +#define mul_u64_u64_div_u64_roundup(a, b, d) \ > + ({ u64 _tmp = (d); mul_u64_add_u64_div_u64(a, b, _tmp - 1, _tmp); }) > + > > /** > * DIV64_U64_ROUND_UP - unsigned 64bit divide with 64bit divisor rounded up > diff --git a/lib/math/div64.c b/lib/math/div64.c > index c426fa0660bc..66bfb6159f02 100644 > --- a/lib/math/div64.c > +++ b/lib/math/div64.c > @@ -183,29 +183,31 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) > } > EXPORT_SYMBOL(iter_div_u64_rem); > > -#ifndef mul_u64_u64_div_u64 > -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > +#ifndef mul_u64_add_u64_div_u64 > +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) > { > /* Trigger exception if divisor is zero */ > BUG_ON(!d); > > - if (ilog2(a) + ilog2(b) <= 62) > - return div64_u64(a * b, d); > - > #if defined(__SIZEOF_INT128__) > > /* native 64x64=128 bits multiplication */ > - u128 prod = (u128)a * b; > + u128 prod = (u128)a * b + c; > u64 n_lo = prod, n_hi = prod >> 64; > > #else > > + if (!c && ilog2(a) + ilog2(b) <= 62) > + return div64_u64(a * b, d); > + Here you should do: if (ilog2(a) + ilog2(b) <= 62) { u64 ab = a * b; u64 abc = ab + c; if (ab <= abc) return div64_u64(abc, d); } This is cheap and won't unconditionally discard the faster path when c != 0; > /* perform a 64x64=128 bits multiplication manually */ > u32 a_lo = a, a_hi = a >> 32, b_lo = b, b_hi = b >> 32; > u64 x, y, z; > > - x = (u64)a_lo * b_lo; > + /* Since (x-1)(x-1) + 2(x-1) == x.x - 1 two u32 can be added to a u64 */ > + x = (u64)a_lo * b_lo + (u32)c; > y = (u64)a_lo * b_hi + (u32)(x >> 32); > + y += (u32)(c >> 32); > z = (u64)a_hi * b_hi + (u32)(y >> 32); > y = (u64)a_hi * b_lo + (u32)y; > z += (u32)(y >> 32); > @@ -215,6 +217,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > #endif > > + if (!n_hi) > + return div64_u64(n_lo, d); > + > int shift = __builtin_ctzll(d); > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ > @@ -261,5 +266,5 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > return res; > } > -EXPORT_SYMBOL(mul_u64_u64_div_u64); > +EXPORT_SYMBOL(mul_u64_add_u64_div_u64); > #endif > -- > 2.39.5 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-20 3:03 ` Nicolas Pitre @ 2025-05-20 21:37 ` David Laight 2025-05-20 22:24 ` Nicolas Pitre 0 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-20 21:37 UTC (permalink / raw) To: Nicolas Pitre Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Mon, 19 May 2025 23:03:21 -0400 (EDT) Nicolas Pitre <npitre@baylibre.com> wrote: > On Sun, 18 May 2025, David Laight wrote: > > > The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' > > variant needs 'divisor - 1' adding in between the multiply and > > divide so cannot easily be done by a caller. > > > > Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d > > and implement the 'round down' and 'round up' using it. > > > > Update the x86-64 asm to optimise for 'c' being a constant zero. > > > > For architectures that support u128 check for a 64bit product after > > the multiply (will be cheap). > > Leave in the early check for other architectures (mostly 32bit) when > > 'c' is zero to avoid the multi-part multiply. > > I agree with this, except for the "'c' is zero" part. More below. > > > Note that the cost of the 128bit divide will dwarf the rest of the code. > > This function is very slow on everything except x86-64 (very very slow > > on 32bit). > > > > Add kerndoc definitions for all three functions. > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > > --- > > Changes for v2 (formally patch 1/3): > > - Reinstate the early call to div64_u64() on 32bit when 'c' is zero. > > Although I'm not convinced the path is common enough to be worth > > the two ilog2() calls. > > > > arch/x86/include/asm/div64.h | 19 ++++++++++----- > > include/linux/math64.h | 45 +++++++++++++++++++++++++++++++++++- > > lib/math/div64.c | 21 ++++++++++------- > > 3 files changed, 70 insertions(+), 15 deletions(-) > > > > diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h > > index 9931e4c7d73f..7a0a916a2d7d 100644 > > --- a/arch/x86/include/asm/div64.h > > +++ b/arch/x86/include/asm/div64.h > > @@ -84,21 +84,28 @@ static inline u64 mul_u32_u32(u32 a, u32 b) > > * Will generate an #DE when the result doesn't fit u64, could fix with an > > * __ex_table[] entry when it becomes an issue. > > */ > > -static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) > > +static inline u64 mul_u64_add_u64_div_u64(u64 a, u64 mul, u64 add, u64 div) > > { > > u64 q; > > > > - asm ("mulq %2; divq %3" : "=a" (q) > > - : "a" (a), "rm" (mul), "rm" (div) > > - : "rdx"); > > + if (statically_true(!add)) { > > + asm ("mulq %2; divq %3" : "=a" (q) > > + : "a" (a), "rm" (mul), "rm" (div) > > + : "rdx"); > > + } else { > > + asm ("mulq %2; addq %3, %%rax; adcq $0, %%rdx; divq %4" > > + : "=a" (q) > > + : "a" (a), "rm" (mul), "rm" (add), "rm" (div) > > + : "rdx"); > > + } > > > > return q; > > } > > -#define mul_u64_u64_div_u64 mul_u64_u64_div_u64 > > +#define mul_u64_add_u64_div_u64 mul_u64_add_u64_div_u64 > > > > static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 div) > > { > > - return mul_u64_u64_div_u64(a, mul, div); > > + return mul_u64_add_u64_div_u64(a, mul, 0, div); > > } > > #define mul_u64_u32_div mul_u64_u32_div > > > > diff --git a/include/linux/math64.h b/include/linux/math64.h > > index 6aaccc1626ab..e1c2e3642cec 100644 > > --- a/include/linux/math64.h > > +++ b/include/linux/math64.h > > @@ -282,7 +282,50 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor) > > } > > #endif /* mul_u64_u32_div */ > > > > -u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div); > > +/** > > + * mul_u64_add_u64_div_u64 - unsigned 64bit multiply, add, and divide > > + * @a: first unsigned 64bit multiplicand > > + * @b: second unsigned 64bit multiplicand > > + * @c: unsigned 64bit addend > > + * @d: unsigned 64bit divisor > > + * > > + * Multiply two 64bit values together to generate a 128bit product > > + * add a third value and then divide by a fourth. > > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > > + * > > + * Return: (@a * @b + @c) / @d > > + */ > > +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); > > + > > +/** > > + * mul_u64_u64_div_u64 - unsigned 64bit multiply and divide > > + * @a: first unsigned 64bit multiplicand > > + * @b: second unsigned 64bit multiplicand > > + * @d: unsigned 64bit divisor > > + * > > + * Multiply two 64bit values together to generate a 128bit product > > + * and then divide by a third value. > > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > > If the quotient exceeds 64 bits, the optimized x86 version truncates the > value to the low 64 bits. The C version returns a saturated value i.e. > UINT64_MAX (implemented with a -1). Nothing actually traps in that case. Nope. I've only got the iAPX 286 and 80386 reference manuals to hand. Both say that 'interrupt 0' happens on overflow. I don't expect the later documentation is any different. If the kernel code is going to have an explicit instruction to trap (rather then the code 'just trapping') it really is best to use BUG(). If nothing else it guarantees a trap regardless of the architecture and compiler. > > > + * > > + * Return: @a * @b / @d > > + */ > > +#define mul_u64_u64_div_u64(a, b, d) mul_u64_add_u64_div_u64(a, b, 0, d) > > + > > +/** > > + * mul_u64_u64_div_u64_roundup - unsigned 64bit multiply and divide rounded up > > + * @a: first unsigned 64bit multiplicand > > + * @b: second unsigned 64bit multiplicand > > + * @d: unsigned 64bit divisor > > + * > > + * Multiply two 64bit values together to generate a 128bit product > > + * and then divide and round up. > > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > > + * > > + * Return: (@a * @b + @d - 1) / @d > > + */ > > +#define mul_u64_u64_div_u64_roundup(a, b, d) \ > > + ({ u64 _tmp = (d); mul_u64_add_u64_div_u64(a, b, _tmp - 1, _tmp); }) > > + > > > > /** > > * DIV64_U64_ROUND_UP - unsigned 64bit divide with 64bit divisor rounded up > > diff --git a/lib/math/div64.c b/lib/math/div64.c > > index c426fa0660bc..66bfb6159f02 100644 > > --- a/lib/math/div64.c > > +++ b/lib/math/div64.c > > @@ -183,29 +183,31 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) > > } > > EXPORT_SYMBOL(iter_div_u64_rem); > > > > -#ifndef mul_u64_u64_div_u64 > > -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > +#ifndef mul_u64_add_u64_div_u64 > > +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) > > { > > /* Trigger exception if divisor is zero */ > > BUG_ON(!d); > > > > - if (ilog2(a) + ilog2(b) <= 62) > > - return div64_u64(a * b, d); > > - > > #if defined(__SIZEOF_INT128__) > > > > /* native 64x64=128 bits multiplication */ > > - u128 prod = (u128)a * b; > > + u128 prod = (u128)a * b + c; > > u64 n_lo = prod, n_hi = prod >> 64; > > > > #else > > > > + if (!c && ilog2(a) + ilog2(b) <= 62) > > + return div64_u64(a * b, d); > > + > > Here you should do: > > if (ilog2(a) + ilog2(b) <= 62) { > u64 ab = a * b; > u64 abc = ab + c; > if (ab <= abc) > return div64_u64(abc, d); > } > > This is cheap and won't unconditionally discard the faster path when c != 0; That isn't really cheap. ilog2() is likely to be a similar cost to a multiply (my brain remembers them both as 'latency 3' on x86). My actual preference is to completely delete that test and rely on the post-multiply check. The 64 by 64 multiply code is actually fairly cheap. On x86-64 it is only a few clocks slower than the u128 version (and that is (much) the same code that should be generated for 32bit). > > > /* perform a 64x64=128 bits multiplication manually */ > > u32 a_lo = a, a_hi = a >> 32, b_lo = b, b_hi = b >> 32; > > u64 x, y, z; > > > > - x = (u64)a_lo * b_lo; > > + /* Since (x-1)(x-1) + 2(x-1) == x.x - 1 two u32 can be added to a u64 */ > > + x = (u64)a_lo * b_lo + (u32)c; > > y = (u64)a_lo * b_hi + (u32)(x >> 32); > > + y += (u32)(c >> 32); Those two adds to y should be swapped - I need to do a v3 and will swap them. It might save one clock - my timing code is accurate, but not THAT accurate. > > z = (u64)a_hi * b_hi + (u32)(y >> 32); > > y = (u64)a_hi * b_lo + (u32)y; > > z += (u32)(y >> 32); > > @@ -215,6 +217,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) If we assume the compiler is sane (gcc isn't), a/b_hi/lo are in registers and mul has a latency of 3 (and add 1) the code above can execute as: clock 0: x_h/x_lo = a_lo * b_lo clock 1: y_h/y_lo = a_lo * b_hi clock 2: y1_ho/y1_lo = a_hi * b_lo clock 3: z_hi/z_lo = a_hi + b_hi; x_lo += c_lo clock 4: x_hi += carry; y_lo += c_hi clock 5: y_hi += carry; y_lo += x_hi clock 6: y_hi += carry; y1_lo += y_lo clock 7: y1_hi += carry; z_lo += y_hi clock 8: z_hi += carry; z_lo += y1_hi clock 9: z_hi += carry I don't think any more instructions can run in parallel. But it really isn't that long an all. Your 'fast path' test will be nearly that long even ignoring mis-predicted branches. For my updated version I've managed to stop gcc spilling zero words to stack! David > > > > #endif > > > > + if (!n_hi) > > + return div64_u64(n_lo, d); > > + > > int shift = __builtin_ctzll(d); > > > > /* try reducing the fraction in case the dividend becomes <= 64 bits */ > > @@ -261,5 +266,5 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > > > > return res; > > } > > -EXPORT_SYMBOL(mul_u64_u64_div_u64); > > +EXPORT_SYMBOL(mul_u64_add_u64_div_u64); > > #endif > > -- > > 2.39.5 > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-20 21:37 ` David Laight @ 2025-05-20 22:24 ` Nicolas Pitre 2025-05-21 12:52 ` David Laight 0 siblings, 1 reply; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 22:24 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Tue, 20 May 2025, David Laight wrote: > On Mon, 19 May 2025 23:03:21 -0400 (EDT) > Nicolas Pitre <npitre@baylibre.com> wrote: > > > On Sun, 18 May 2025, David Laight wrote: > > > > > The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' > > > variant needs 'divisor - 1' adding in between the multiply and > > > divide so cannot easily be done by a caller. > > > > > > Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d > > > and implement the 'round down' and 'round up' using it. > > > > > > Update the x86-64 asm to optimise for 'c' being a constant zero. > > > > > > For architectures that support u128 check for a 64bit product after > > > the multiply (will be cheap). > > > Leave in the early check for other architectures (mostly 32bit) when > > > 'c' is zero to avoid the multi-part multiply. > > > > I agree with this, except for the "'c' is zero" part. More below. > > > > > Note that the cost of the 128bit divide will dwarf the rest of the code. > > > This function is very slow on everything except x86-64 (very very slow > > > on 32bit). > > > > > > Add kerndoc definitions for all three functions. > > > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com> > > > --- > > > Changes for v2 (formally patch 1/3): > > > - Reinstate the early call to div64_u64() on 32bit when 'c' is zero. > > > Although I'm not convinced the path is common enough to be worth > > > the two ilog2() calls. > > > > > > arch/x86/include/asm/div64.h | 19 ++++++++++----- > > > include/linux/math64.h | 45 +++++++++++++++++++++++++++++++++++- > > > lib/math/div64.c | 21 ++++++++++------- > > > 3 files changed, 70 insertions(+), 15 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h > > > index 9931e4c7d73f..7a0a916a2d7d 100644 > > > --- a/arch/x86/include/asm/div64.h > > > +++ b/arch/x86/include/asm/div64.h > > > @@ -84,21 +84,28 @@ static inline u64 mul_u32_u32(u32 a, u32 b) > > > * Will generate an #DE when the result doesn't fit u64, could fix with an > > > * __ex_table[] entry when it becomes an issue. > > > */ > > > -static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) > > > +static inline u64 mul_u64_add_u64_div_u64(u64 a, u64 mul, u64 add, u64 div) > > > { > > > u64 q; > > > > > > - asm ("mulq %2; divq %3" : "=a" (q) > > > - : "a" (a), "rm" (mul), "rm" (div) > > > - : "rdx"); > > > + if (statically_true(!add)) { > > > + asm ("mulq %2; divq %3" : "=a" (q) > > > + : "a" (a), "rm" (mul), "rm" (div) > > > + : "rdx"); > > > + } else { > > > + asm ("mulq %2; addq %3, %%rax; adcq $0, %%rdx; divq %4" > > > + : "=a" (q) > > > + : "a" (a), "rm" (mul), "rm" (add), "rm" (div) > > > + : "rdx"); > > > + } > > > > > > return q; > > > } > > > -#define mul_u64_u64_div_u64 mul_u64_u64_div_u64 > > > +#define mul_u64_add_u64_div_u64 mul_u64_add_u64_div_u64 > > > > > > static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 div) > > > { > > > - return mul_u64_u64_div_u64(a, mul, div); > > > + return mul_u64_add_u64_div_u64(a, mul, 0, div); > > > } > > > #define mul_u64_u32_div mul_u64_u32_div > > > > > > diff --git a/include/linux/math64.h b/include/linux/math64.h > > > index 6aaccc1626ab..e1c2e3642cec 100644 > > > --- a/include/linux/math64.h > > > +++ b/include/linux/math64.h > > > @@ -282,7 +282,50 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor) > > > } > > > #endif /* mul_u64_u32_div */ > > > > > > -u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div); > > > +/** > > > + * mul_u64_add_u64_div_u64 - unsigned 64bit multiply, add, and divide > > > + * @a: first unsigned 64bit multiplicand > > > + * @b: second unsigned 64bit multiplicand > > > + * @c: unsigned 64bit addend > > > + * @d: unsigned 64bit divisor > > > + * > > > + * Multiply two 64bit values together to generate a 128bit product > > > + * add a third value and then divide by a fourth. > > > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > > > + * > > > + * Return: (@a * @b + @c) / @d > > > + */ > > > +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); > > > + > > > +/** > > > + * mul_u64_u64_div_u64 - unsigned 64bit multiply and divide > > > + * @a: first unsigned 64bit multiplicand > > > + * @b: second unsigned 64bit multiplicand > > > + * @d: unsigned 64bit divisor > > > + * > > > + * Multiply two 64bit values together to generate a 128bit product > > > + * and then divide by a third value. > > > + * May BUG()/trap if @d is zero or the quotient exceeds 64 bits. > > > > If the quotient exceeds 64 bits, the optimized x86 version truncates the > > value to the low 64 bits. The C version returns a saturated value i.e. > > UINT64_MAX (implemented with a -1). Nothing actually traps in that case. > > Nope. I've only got the iAPX 286 and 80386 reference manuals to hand. > Both say that 'interrupt 0' happens on overflow. > I don't expect the later documentation is any different. Hmmm... OK, you're right. I must have botched my test code initially. > If the kernel code is going to have an explicit instruction to trap > (rather then the code 'just trapping') it really is best to use BUG(). > If nothing else it guarantees a trap regardless of the architecture > and compiler. OK in the overflow case. However in the div-by_0 case it is best if, for a given architecture, the behavior is coherent across all division operations. > > > + if (!c && ilog2(a) + ilog2(b) <= 62) > > > + return div64_u64(a * b, d); > > > + > > > > Here you should do: > > > > if (ilog2(a) + ilog2(b) <= 62) { > > u64 ab = a * b; > > u64 abc = ab + c; > > if (ab <= abc) > > return div64_u64(abc, d); > > } > > > > This is cheap and won't unconditionally discard the faster path when c != 0; > > That isn't really cheap. > ilog2() is likely to be a similar cost to a multiply > (my brain remembers them both as 'latency 3' on x86). I'm not discussing the ilog2() usage though. I'm just against limiting the test to !c. My suggestion is about supporting all values of c. > My actual preference is to completely delete that test and rely > on the post-multiply check. > > The 64 by 64 multiply code is actually fairly cheap. > On x86-64 it is only a few clocks slower than the u128 version > (and that is (much) the same code that should be generated for 32bit). Of course x86-64 is not the primary target here as it has its own optimized version. Nicolas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-20 22:24 ` Nicolas Pitre @ 2025-05-21 12:52 ` David Laight 2025-05-21 13:50 ` Nicolas Pitre 0 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-21 12:52 UTC (permalink / raw) To: Nicolas Pitre Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Tue, 20 May 2025 18:24:58 -0400 (EDT) Nicolas Pitre <npitre@baylibre.com> wrote: > On Tue, 20 May 2025, David Laight wrote: > > > On Mon, 19 May 2025 23:03:21 -0400 (EDT) > > Nicolas Pitre <npitre@baylibre.com> wrote: > > ... > > > Here you should do: > > > > > > if (ilog2(a) + ilog2(b) <= 62) { > > > u64 ab = a * b; > > > u64 abc = ab + c; > > > if (ab <= abc) > > > return div64_u64(abc, d); > > > } > > > > > > This is cheap and won't unconditionally discard the faster path when c != 0; > > > > That isn't really cheap. > > ilog2() is likely to be a similar cost to a multiply > > (my brain remembers them both as 'latency 3' on x86). > > I'm not discussing the ilog2() usage though. I'm just against limiting > the test to !c. My suggestion is about supporting all values of c. I've had further thoughts on that test. Most (but not all - and I've forgotten which) 64bit cpu have a 64x64=>128 multiple and support u128. So the 'multiply in parts' code is mostly for 32bit. That means that the 'a * b' for the call to div64_u64() has to be three 32x32=>64 multiplies with all the extra 'add' and 'adc $0' to generate a correct 64bit result. This is (well should be) much the same as the multiply coded in the function - except it is generated by the compiler itself. The only parts it can ignore are the those that set 'z' and 'y_hi'. If my clock sequence (in the other email) is correct it saves 3 of 10 clocks - so test to avoid the multiply has to be better than that. That probably means the only worthwhile check is for a and b being 32bit so a single multiply can be used. The generated code for 32bit x86 isn't as good as one might hope. partially due to only having 7 (6 if %bp is a stack frame) registers. clang makes a reasonable job of it, gcc doesn't. There is already a mul_u32_u32() wrapper in arch/x86/include/asm/div64.h. There needs to be a similar add_u64_u32() (contains add %s,%d_lo, adc $0,%d_hi). Without them gcc spills a lot of values to stack - including constant zeros. I might add those and use them in v3 (which I need to send anyway). They'll match what my 'pending' faster code does. David ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-21 12:52 ` David Laight @ 2025-05-21 13:50 ` Nicolas Pitre 2025-05-25 11:38 ` David Laight 0 siblings, 1 reply; 21+ messages in thread From: Nicolas Pitre @ 2025-05-21 13:50 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Wed, 21 May 2025, David Laight wrote: > On Tue, 20 May 2025 18:24:58 -0400 (EDT) > Nicolas Pitre <npitre@baylibre.com> wrote: > > > On Tue, 20 May 2025, David Laight wrote: > > > > > On Mon, 19 May 2025 23:03:21 -0400 (EDT) > > > Nicolas Pitre <npitre@baylibre.com> wrote: > > > > ... > > > > Here you should do: > > > > > > > > if (ilog2(a) + ilog2(b) <= 62) { > > > > u64 ab = a * b; > > > > u64 abc = ab + c; > > > > if (ab <= abc) > > > > return div64_u64(abc, d); > > > > } > > > > > > > > This is cheap and won't unconditionally discard the faster path when c != 0; > > > > > > That isn't really cheap. > > > ilog2() is likely to be a similar cost to a multiply > > > (my brain remembers them both as 'latency 3' on x86). > > > > I'm not discussing the ilog2() usage though. I'm just against limiting > > the test to !c. My suggestion is about supporting all values of c. > > I've had further thoughts on that test. > Most (but not all - and I've forgotten which) 64bit cpu have a 64x64=>128 > multiple and support u128. Looks like X86-64, ARM64 and RV64 have it. That's probably 99% of the market. > So the 'multiply in parts' code is mostly for 32bit. Exact. > That means that the 'a * b' for the call to div64_u64() has to be three > 32x32=>64 multiplies with all the extra 'add' and 'adc $0' to generate > a correct 64bit result. 4 multiplies to be precise. > This is (well should be) much the same as the multiply coded in the > function - except it is generated by the compiler itself. I don't follow you here. What is the same as what? > The only parts it can ignore are the those that set 'z' and 'y_hi'. > If my clock sequence (in the other email) is correct it saves 3 of 10 > clocks - so test to avoid the multiply has to be better than that. > That probably means the only worthwhile check is for a and b being 32bit > so a single multiply can be used. Depends how costly the ilog2 is. On ARM the clz instruction is about 1 cycle. If you need to figure out the MSB manually then it might be best to skip those ilog2's. > The generated code for 32bit x86 isn't as good as one might hope. > partially due to only having 7 (6 if %bp is a stack frame) registers. > clang makes a reasonable job of it, gcc doesn't. > There is already a mul_u32_u32() wrapper in arch/x86/include/asm/div64.h. > There needs to be a similar add_u64_u32() (contains add %s,%d_lo, adc $0,%d_hi). > Without them gcc spills a lot of values to stack - including constant zeros. I mainly looked at ARM32 and both gcc and clang do a good job here. ARM registers are plentiful of course. > I might add those and use them in v3 (which I need to send anyway). > They'll match what my 'pending' faster code does. > > David > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() 2025-05-21 13:50 ` Nicolas Pitre @ 2025-05-25 11:38 ` David Laight 0 siblings, 0 replies; 21+ messages in thread From: David Laight @ 2025-05-25 11:38 UTC (permalink / raw) To: Nicolas Pitre Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Wed, 21 May 2025 09:50:28 -0400 (EDT) Nicolas Pitre <npitre@baylibre.com> wrote: > On Wed, 21 May 2025, David Laight wrote: >... > > Depends how costly the ilog2 is. On ARM the clz instruction is about 1 > cycle. If you need to figure out the MSB manually then it might be best > to skip those ilog2's. I was going to measure it. But I've pulled chunks from the kernel headers into a userspace build. This is the x86-32 code for the 'if (ilog2(a) + ilog2(b) < 62)' test: 1b2b: 0f bd c6 bsr %esi,%eax 1b2e: 75 05 jne 1b35 <mul_u64_add_u64_div_u64_new+0x75> 1b30: b8 ff ff ff ff mov $0xffffffff,%eax 1b35: 85 c9 test %ecx,%ecx 1b37: 74 0d je 1b46 <mul_u64_add_u64_div_u64_new+0x86> 1b39: 0f bd c1 bsr %ecx,%eax 1b3c: 75 05 jne 1b43 <mul_u64_add_u64_div_u64_new+0x83> 1b3e: b8 ff ff ff ff mov $0xffffffff,%eax 1b43: 83 c0 20 add $0x20,%eax 1b46: 0f bd d5 bsr %ebp,%edx 1b49: 75 05 jne 1b50 <mul_u64_add_u64_div_u64_new+0x90> 1b4b: ba ff ff ff ff mov $0xffffffff,%edx 1b50: 85 ff test %edi,%edi 1b52: 74 0d je 1b61 <mul_u64_add_u64_div_u64_new+0xa1> 1b54: 0f bd d7 bsr %edi,%edx 1b57: 75 05 jne 1b5e <mul_u64_add_u64_div_u64_new+0x9e> 1b59: ba ff ff ff ff mov $0xffffffff,%edx 1b5e: 83 c2 20 add $0x20,%edx 1b61: 8d 1c 02 lea (%edx,%eax,1),%ebx 1b64: 83 fb 3e cmp $0x3e,%ebx 1b67: 0f 8e 0b 03 00 00 jle 1e78 <mul_u64_add_u64_div_u64_new+0x3b8> If 'cmov' is enabled (not by default even after the current plan to remove 486 support) it is: 1b2b: ba ff ff ff ff mov $0xffffffff,%edx 1b30: 85 c9 test %ecx,%ecx 1b32: 0f 85 98 03 00 00 jne 1ed0 <mul_u64_add_u64_div_u64_new+0x410> 1b38: 0f bd c6 bsr %esi,%eax 1b3b: 0f 44 c2 cmove %edx,%eax 1b3e: bb ff ff ff ff mov $0xffffffff,%ebx 1b43: 85 ff test %edi,%edi 1b45: 0f 85 75 03 00 00 jne 1ec0 <mul_u64_add_u64_div_u64_new+0x400> 1b4b: 0f bd d5 bsr %ebp,%edx 1b4e: 0f 44 d3 cmove %ebx,%edx 1b51: 8d 1c 02 lea (%edx,%eax,1),%ebx 1b54: 83 fb 3e cmp $0x3e,%ebx 1b57: 0f 8e 0b 03 00 00 jle 1e68 <mul_u64_add_u64_div_u64_new+0x3a8> with: 1ec0: 0f bd d7 bsr %edi,%edx 1ec3: 0f 44 d3 cmove %ebx,%edx 1ec6: 83 c2 20 add $0x20,%edx 1ec9: e9 83 fc ff ff jmp 1b51 <mul_u64_add_u64_div_u64_new+0x91> 1ed0: 0f bd c1 bsr %ecx,%eax 1ed3: 0f 44 c2 cmove %edx,%eax 1ed6: 83 c0 20 add $0x20,%eax 1ed9: e9 60 fc ff ff jmp 1b3e <mul_u64_add_u64_div_u64_new+0x7e> Neither is pretty... Some of the 'damage' is because the x86 'bsr' (bit scan reverse) sets 'z' for zero and leaves the output undefined (unchanged on later cpu). For reference I can get the multiply down to: 1b5d: 89 f0 mov %esi,%eax 1b5f: f7 e5 mul %ebp 1b61: 03 44 24 38 add 0x38(%esp),%eax 1b65: 83 d2 00 adc $0x0,%edx 1b68: 89 d3 mov %edx,%ebx 1b6a: 89 44 24 08 mov %eax,0x8(%esp) 1b6e: 89 f0 mov %esi,%eax 1b70: f7 e7 mul %edi 1b72: 03 44 24 3c add 0x3c(%esp),%eax 1b76: 83 d2 00 adc $0x0,%edx 1b79: 01 d8 add %ebx,%eax 1b7b: 83 d2 00 adc $0x0,%edx 1b7e: 89 d6 mov %edx,%esi 1b80: 89 c3 mov %eax,%ebx 1b82: 89 c8 mov %ecx,%eax 1b84: f7 e7 mul %edi 1b86: 89 c7 mov %eax,%edi 1b88: 89 c8 mov %ecx,%eax 1b8a: 01 f7 add %esi,%edi 1b8c: 83 d2 00 adc $0x0,%edx 1b8f: 89 d6 mov %edx,%esi 1b91: f7 e5 mul %ebp 1b93: 89 c1 mov %eax,%ecx 1b95: 8b 44 24 08 mov 0x8(%esp),%eax 1b99: 89 d5 mov %edx,%ebp 1b9b: 01 d9 add %ebx,%ecx 1b9d: 83 d5 00 adc $0x0,%ebp 1ba0: 89 44 24 28 mov %eax,0x28(%esp) 1ba4: 01 ef add %ebp,%edi 1ba6: 83 d6 00 adc $0x0,%esi 1ba9: 89 74 24 1c mov %esi,0x1c(%esp) 1bad: 8b 5c 24 1c mov 0x1c(%esp),%ebx 1bb1: 89 7c 24 18 mov %edi,0x18(%esp) 1bb5: 8b 44 24 18 mov 0x18(%esp),%eax 1bb9: 89 4c 24 2c mov %ecx,0x2c(%esp) 1bbd: 09 c3 or %eax,%ebx 1bbf: 0f 84 1b 03 00 00 je 1ee0 <mul_u64_add_u64_div_u64_new+0x420> If you follow the register dependency chain it won't be as long as it looks. (Although the last few instructions are terrible! - I've tried a few things and they won't go away.) David ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() 2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight ` (2 preceding siblings ...) 2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight @ 2025-05-18 13:38 ` David Laight 2025-05-20 3:07 ` Nicolas Pitre 3 siblings, 1 reply; 21+ messages in thread From: David Laight @ 2025-05-18 13:38 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: David Laight, u.kleine-koenig, Nicolas Pitre, Oleg Nesterov, Peter Zijlstra, Biju Das Replicate the existing mul_u64_u64_div_u64() test cases with round up. Update the shell script that verifies the table, remove the comment markers so that it can be directly pasted into a shell. It any tests fail then fail the module load with -EINVAL. Signed-off-by: David Laight <david.laight.linux@gmail.com> --- Changes for v2 (was patch 1/3): - Include the total number of tests in the final pr_info(). lib/math/test_mul_u64_u64_div_u64.c | 116 +++++++++++++++++----------- 1 file changed, 70 insertions(+), 46 deletions(-) diff --git a/lib/math/test_mul_u64_u64_div_u64.c b/lib/math/test_mul_u64_u64_div_u64.c index 58d058de4e73..383b0dfc8023 100644 --- a/lib/math/test_mul_u64_u64_div_u64.c +++ b/lib/math/test_mul_u64_u64_div_u64.c @@ -10,61 +10,73 @@ #include <linux/printk.h> #include <linux/math64.h> -typedef struct { u64 a; u64 b; u64 c; u64 result; } test_params; +typedef struct { u64 a; u64 b; u64 c; u64 result; uint round_up;} test_params; static test_params test_values[] = { /* this contains many edge values followed by a couple random values */ -{ 0xb, 0x7, 0x3, 0x19 }, -{ 0xffff0000, 0xffff0000, 0xf, 0x1110eeef00000000 }, -{ 0xffffffff, 0xffffffff, 0x1, 0xfffffffe00000001 }, -{ 0xffffffff, 0xffffffff, 0x2, 0x7fffffff00000000 }, -{ 0x1ffffffff, 0xffffffff, 0x2, 0xfffffffe80000000 }, -{ 0x1ffffffff, 0xffffffff, 0x3, 0xaaaaaaa9aaaaaaab }, -{ 0x1ffffffff, 0x1ffffffff, 0x4, 0xffffffff00000000 }, -{ 0xffff000000000000, 0xffff000000000000, 0xffff000000000001, 0xfffeffffffffffff }, -{ 0x3333333333333333, 0x3333333333333333, 0x5555555555555555, 0x1eb851eb851eb851 }, -{ 0x7fffffffffffffff, 0x2, 0x3, 0x5555555555555554 }, -{ 0xffffffffffffffff, 0x2, 0x8000000000000000, 0x3 }, -{ 0xffffffffffffffff, 0x2, 0xc000000000000000, 0x2 }, -{ 0xffffffffffffffff, 0x4000000000000004, 0x8000000000000000, 0x8000000000000007 }, -{ 0xffffffffffffffff, 0x4000000000000001, 0x8000000000000000, 0x8000000000000001 }, -{ 0xffffffffffffffff, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000001 }, -{ 0xfffffffffffffffe, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000000 }, -{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffe, 0x8000000000000001 }, -{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffd, 0x8000000000000002 }, -{ 0x7fffffffffffffff, 0xffffffffffffffff, 0xc000000000000000, 0xaaaaaaaaaaaaaaa8 }, -{ 0xffffffffffffffff, 0x7fffffffffffffff, 0xa000000000000000, 0xccccccccccccccca }, -{ 0xffffffffffffffff, 0x7fffffffffffffff, 0x9000000000000000, 0xe38e38e38e38e38b }, -{ 0x7fffffffffffffff, 0x7fffffffffffffff, 0x5000000000000000, 0xccccccccccccccc9 }, -{ 0xffffffffffffffff, 0xfffffffffffffffe, 0xffffffffffffffff, 0xfffffffffffffffe }, -{ 0xe6102d256d7ea3ae, 0x70a77d0be4c31201, 0xd63ec35ab3220357, 0x78f8bf8cc86c6e18 }, -{ 0xf53bae05cb86c6e1, 0x3847b32d2f8d32e0, 0xcfd4f55a647f403c, 0x42687f79d8998d35 }, -{ 0x9951c5498f941092, 0x1f8c8bfdf287a251, 0xa3c8dc5f81ea3fe2, 0x1d887cb25900091f }, -{ 0x374fee9daa1bb2bb, 0x0d0bfbff7b8ae3ef, 0xc169337bd42d5179, 0x03bb2dbaffcbb961 }, -{ 0xeac0d03ac10eeaf0, 0x89be05dfa162ed9b, 0x92bb1679a41f0e4b, 0xdc5f5cc9e270d216 }, +{ 0xb, 0x7, 0x3, 0x19, 1 }, +{ 0xffff0000, 0xffff0000, 0xf, 0x1110eeef00000000, 0 }, +{ 0xffffffff, 0xffffffff, 0x1, 0xfffffffe00000001, 0 }, +{ 0xffffffff, 0xffffffff, 0x2, 0x7fffffff00000000, 1 }, +{ 0x1ffffffff, 0xffffffff, 0x2, 0xfffffffe80000000, 1 }, +{ 0x1ffffffff, 0xffffffff, 0x3, 0xaaaaaaa9aaaaaaab, 0 }, +{ 0x1ffffffff, 0x1ffffffff, 0x4, 0xffffffff00000000, 1 }, +{ 0xffff000000000000, 0xffff000000000000, 0xffff000000000001, 0xfffeffffffffffff, 1 }, +{ 0x3333333333333333, 0x3333333333333333, 0x5555555555555555, 0x1eb851eb851eb851, 1 }, +{ 0x7fffffffffffffff, 0x2, 0x3, 0x5555555555555554, 1 }, +{ 0xffffffffffffffff, 0x2, 0x8000000000000000, 0x3, 1 }, +{ 0xffffffffffffffff, 0x2, 0xc000000000000000, 0x2, 1 }, +{ 0xffffffffffffffff, 0x4000000000000004, 0x8000000000000000, 0x8000000000000007, 1 }, +{ 0xffffffffffffffff, 0x4000000000000001, 0x8000000000000000, 0x8000000000000001, 1 }, +{ 0xffffffffffffffff, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000001, 0 }, +{ 0xfffffffffffffffe, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000000, 1 }, +{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffe, 0x8000000000000001, 1 }, +{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffd, 0x8000000000000002, 1 }, +{ 0x7fffffffffffffff, 0xffffffffffffffff, 0xc000000000000000, 0xaaaaaaaaaaaaaaa8, 1 }, +{ 0xffffffffffffffff, 0x7fffffffffffffff, 0xa000000000000000, 0xccccccccccccccca, 1 }, +{ 0xffffffffffffffff, 0x7fffffffffffffff, 0x9000000000000000, 0xe38e38e38e38e38b, 1 }, +{ 0x7fffffffffffffff, 0x7fffffffffffffff, 0x5000000000000000, 0xccccccccccccccc9, 1 }, +{ 0xffffffffffffffff, 0xfffffffffffffffe, 0xffffffffffffffff, 0xfffffffffffffffe, 0 }, +{ 0xe6102d256d7ea3ae, 0x70a77d0be4c31201, 0xd63ec35ab3220357, 0x78f8bf8cc86c6e18, 1 }, +{ 0xf53bae05cb86c6e1, 0x3847b32d2f8d32e0, 0xcfd4f55a647f403c, 0x42687f79d8998d35, 1 }, +{ 0x9951c5498f941092, 0x1f8c8bfdf287a251, 0xa3c8dc5f81ea3fe2, 0x1d887cb25900091f, 1 }, +{ 0x374fee9daa1bb2bb, 0x0d0bfbff7b8ae3ef, 0xc169337bd42d5179, 0x03bb2dbaffcbb961, 1 }, +{ 0xeac0d03ac10eeaf0, 0x89be05dfa162ed9b, 0x92bb1679a41f0e4b, 0xdc5f5cc9e270d216, 1 }, }; /* * The above table can be verified with the following shell script: - * - * #!/bin/sh - * sed -ne 's/^{ \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\) },$/\1 \2 \3 \4/p' \ - * lib/math/test_mul_u64_u64_div_u64.c | - * while read a b c r; do - * expected=$( printf "obase=16; ibase=16; %X * %X / %X\n" $a $b $c | bc ) - * given=$( printf "%X\n" $r ) - * if [ "$expected" = "$given" ]; then - * echo "$a * $b / $c = $r OK" - * else - * echo "$a * $b / $c = $r is wrong" >&2 - * echo "should be equivalent to 0x$expected" >&2 - * exit 1 - * fi - * done + +#!/bin/sh +sed -ne 's/^{ \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\) },$/\1 \2 \3 \4 \5/p' \ + lib/math/test_mul_u64_u64_div_u64.c | +while read a b c r d; do + expected=$( printf "obase=16; ibase=16; %X * %X / %X\n" $a $b $c | bc ) + given=$( printf "%X\n" $r ) + if [ "$expected" = "$given" ]; then + echo "$a * $b / $c = $r OK" + else + echo "$a * $b / $c = $r is wrong" >&2 + echo "should be equivalent to 0x$expected" >&2 + exit 1 + fi + expected=$( printf "obase=16; ibase=16; (%X * %X + %X) / %X\n" $a $b $((c-1)) $c | bc ) + given=$( printf "%X\n" $((r + d)) ) + if [ "$expected" = "$given" ]; then + echo "$a * $b +/ $c = $(printf '%#x' $((r + d))) OK" + else + echo "$a * $b +/ $c = $(printf '%#x' $((r + d))) is wrong" >&2 + echo "should be equivalent to 0x$expected" >&2 + exit 1 + fi +done + */ static int __init test_init(void) { + int errors = 0; + int tests = 0; int i; pr_info("Starting mul_u64_u64_div_u64() test\n"); @@ -75,16 +87,28 @@ static int __init test_init(void) u64 c = test_values[i].c; u64 expected_result = test_values[i].result; u64 result = mul_u64_u64_div_u64(a, b, c); + u64 result_up = mul_u64_u64_div_u64_roundup(a, b, c); + + tests += 2; if (result != expected_result) { pr_err("ERROR: 0x%016llx * 0x%016llx / 0x%016llx\n", a, b, c); pr_err("ERROR: expected result: %016llx\n", expected_result); pr_err("ERROR: obtained result: %016llx\n", result); + errors++; + } + expected_result += test_values[i].round_up; + if (result_up != expected_result) { + pr_err("ERROR: 0x%016llx * 0x%016llx +/ 0x%016llx\n", a, b, c); + pr_err("ERROR: expected result: %016llx\n", expected_result); + pr_err("ERROR: obtained result: %016llx\n", result_up); + errors++; } } - pr_info("Completed mul_u64_u64_div_u64() test\n"); - return 0; + pr_info("Completed mul_u64_u64_div_u64() test, %d tests, %d errors\n", + tests, errors); + return errors ? -EINVAL : 0; } static void __exit test_exit(void) -- 2.39.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() 2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight @ 2025-05-20 3:07 ` Nicolas Pitre 0 siblings, 0 replies; 21+ messages in thread From: Nicolas Pitre @ 2025-05-20 3:07 UTC (permalink / raw) To: David Laight Cc: Andrew Morton, linux-kernel, u.kleine-koenig, Oleg Nesterov, Peter Zijlstra, Biju Das On Sun, 18 May 2025, David Laight wrote: > Replicate the existing mul_u64_u64_div_u64() test cases with round up. > Update the shell script that verifies the table, remove the comment > markers so that it can be directly pasted into a shell. > > It any tests fail then fail the module load with -EINVAL. > > Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> > --- > > Changes for v2 (was patch 1/3): > - Include the total number of tests in the final pr_info(). > > lib/math/test_mul_u64_u64_div_u64.c | 116 +++++++++++++++++----------- > 1 file changed, 70 insertions(+), 46 deletions(-) > > diff --git a/lib/math/test_mul_u64_u64_div_u64.c b/lib/math/test_mul_u64_u64_div_u64.c > index 58d058de4e73..383b0dfc8023 100644 > --- a/lib/math/test_mul_u64_u64_div_u64.c > +++ b/lib/math/test_mul_u64_u64_div_u64.c > @@ -10,61 +10,73 @@ > #include <linux/printk.h> > #include <linux/math64.h> > > -typedef struct { u64 a; u64 b; u64 c; u64 result; } test_params; > +typedef struct { u64 a; u64 b; u64 c; u64 result; uint round_up;} test_params; > > static test_params test_values[] = { > /* this contains many edge values followed by a couple random values */ > -{ 0xb, 0x7, 0x3, 0x19 }, > -{ 0xffff0000, 0xffff0000, 0xf, 0x1110eeef00000000 }, > -{ 0xffffffff, 0xffffffff, 0x1, 0xfffffffe00000001 }, > -{ 0xffffffff, 0xffffffff, 0x2, 0x7fffffff00000000 }, > -{ 0x1ffffffff, 0xffffffff, 0x2, 0xfffffffe80000000 }, > -{ 0x1ffffffff, 0xffffffff, 0x3, 0xaaaaaaa9aaaaaaab }, > -{ 0x1ffffffff, 0x1ffffffff, 0x4, 0xffffffff00000000 }, > -{ 0xffff000000000000, 0xffff000000000000, 0xffff000000000001, 0xfffeffffffffffff }, > -{ 0x3333333333333333, 0x3333333333333333, 0x5555555555555555, 0x1eb851eb851eb851 }, > -{ 0x7fffffffffffffff, 0x2, 0x3, 0x5555555555555554 }, > -{ 0xffffffffffffffff, 0x2, 0x8000000000000000, 0x3 }, > -{ 0xffffffffffffffff, 0x2, 0xc000000000000000, 0x2 }, > -{ 0xffffffffffffffff, 0x4000000000000004, 0x8000000000000000, 0x8000000000000007 }, > -{ 0xffffffffffffffff, 0x4000000000000001, 0x8000000000000000, 0x8000000000000001 }, > -{ 0xffffffffffffffff, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000001 }, > -{ 0xfffffffffffffffe, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000000 }, > -{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffe, 0x8000000000000001 }, > -{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffd, 0x8000000000000002 }, > -{ 0x7fffffffffffffff, 0xffffffffffffffff, 0xc000000000000000, 0xaaaaaaaaaaaaaaa8 }, > -{ 0xffffffffffffffff, 0x7fffffffffffffff, 0xa000000000000000, 0xccccccccccccccca }, > -{ 0xffffffffffffffff, 0x7fffffffffffffff, 0x9000000000000000, 0xe38e38e38e38e38b }, > -{ 0x7fffffffffffffff, 0x7fffffffffffffff, 0x5000000000000000, 0xccccccccccccccc9 }, > -{ 0xffffffffffffffff, 0xfffffffffffffffe, 0xffffffffffffffff, 0xfffffffffffffffe }, > -{ 0xe6102d256d7ea3ae, 0x70a77d0be4c31201, 0xd63ec35ab3220357, 0x78f8bf8cc86c6e18 }, > -{ 0xf53bae05cb86c6e1, 0x3847b32d2f8d32e0, 0xcfd4f55a647f403c, 0x42687f79d8998d35 }, > -{ 0x9951c5498f941092, 0x1f8c8bfdf287a251, 0xa3c8dc5f81ea3fe2, 0x1d887cb25900091f }, > -{ 0x374fee9daa1bb2bb, 0x0d0bfbff7b8ae3ef, 0xc169337bd42d5179, 0x03bb2dbaffcbb961 }, > -{ 0xeac0d03ac10eeaf0, 0x89be05dfa162ed9b, 0x92bb1679a41f0e4b, 0xdc5f5cc9e270d216 }, > +{ 0xb, 0x7, 0x3, 0x19, 1 }, > +{ 0xffff0000, 0xffff0000, 0xf, 0x1110eeef00000000, 0 }, > +{ 0xffffffff, 0xffffffff, 0x1, 0xfffffffe00000001, 0 }, > +{ 0xffffffff, 0xffffffff, 0x2, 0x7fffffff00000000, 1 }, > +{ 0x1ffffffff, 0xffffffff, 0x2, 0xfffffffe80000000, 1 }, > +{ 0x1ffffffff, 0xffffffff, 0x3, 0xaaaaaaa9aaaaaaab, 0 }, > +{ 0x1ffffffff, 0x1ffffffff, 0x4, 0xffffffff00000000, 1 }, > +{ 0xffff000000000000, 0xffff000000000000, 0xffff000000000001, 0xfffeffffffffffff, 1 }, > +{ 0x3333333333333333, 0x3333333333333333, 0x5555555555555555, 0x1eb851eb851eb851, 1 }, > +{ 0x7fffffffffffffff, 0x2, 0x3, 0x5555555555555554, 1 }, > +{ 0xffffffffffffffff, 0x2, 0x8000000000000000, 0x3, 1 }, > +{ 0xffffffffffffffff, 0x2, 0xc000000000000000, 0x2, 1 }, > +{ 0xffffffffffffffff, 0x4000000000000004, 0x8000000000000000, 0x8000000000000007, 1 }, > +{ 0xffffffffffffffff, 0x4000000000000001, 0x8000000000000000, 0x8000000000000001, 1 }, > +{ 0xffffffffffffffff, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000001, 0 }, > +{ 0xfffffffffffffffe, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000000000, 1 }, > +{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffe, 0x8000000000000001, 1 }, > +{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffd, 0x8000000000000002, 1 }, > +{ 0x7fffffffffffffff, 0xffffffffffffffff, 0xc000000000000000, 0xaaaaaaaaaaaaaaa8, 1 }, > +{ 0xffffffffffffffff, 0x7fffffffffffffff, 0xa000000000000000, 0xccccccccccccccca, 1 }, > +{ 0xffffffffffffffff, 0x7fffffffffffffff, 0x9000000000000000, 0xe38e38e38e38e38b, 1 }, > +{ 0x7fffffffffffffff, 0x7fffffffffffffff, 0x5000000000000000, 0xccccccccccccccc9, 1 }, > +{ 0xffffffffffffffff, 0xfffffffffffffffe, 0xffffffffffffffff, 0xfffffffffffffffe, 0 }, > +{ 0xe6102d256d7ea3ae, 0x70a77d0be4c31201, 0xd63ec35ab3220357, 0x78f8bf8cc86c6e18, 1 }, > +{ 0xf53bae05cb86c6e1, 0x3847b32d2f8d32e0, 0xcfd4f55a647f403c, 0x42687f79d8998d35, 1 }, > +{ 0x9951c5498f941092, 0x1f8c8bfdf287a251, 0xa3c8dc5f81ea3fe2, 0x1d887cb25900091f, 1 }, > +{ 0x374fee9daa1bb2bb, 0x0d0bfbff7b8ae3ef, 0xc169337bd42d5179, 0x03bb2dbaffcbb961, 1 }, > +{ 0xeac0d03ac10eeaf0, 0x89be05dfa162ed9b, 0x92bb1679a41f0e4b, 0xdc5f5cc9e270d216, 1 }, > }; > > /* > * The above table can be verified with the following shell script: > - * > - * #!/bin/sh > - * sed -ne 's/^{ \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\) },$/\1 \2 \3 \4/p' \ > - * lib/math/test_mul_u64_u64_div_u64.c | > - * while read a b c r; do > - * expected=$( printf "obase=16; ibase=16; %X * %X / %X\n" $a $b $c | bc ) > - * given=$( printf "%X\n" $r ) > - * if [ "$expected" = "$given" ]; then > - * echo "$a * $b / $c = $r OK" > - * else > - * echo "$a * $b / $c = $r is wrong" >&2 > - * echo "should be equivalent to 0x$expected" >&2 > - * exit 1 > - * fi > - * done > + > +#!/bin/sh > +sed -ne 's/^{ \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\) },$/\1 \2 \3 \4 \5/p' \ > + lib/math/test_mul_u64_u64_div_u64.c | > +while read a b c r d; do > + expected=$( printf "obase=16; ibase=16; %X * %X / %X\n" $a $b $c | bc ) > + given=$( printf "%X\n" $r ) > + if [ "$expected" = "$given" ]; then > + echo "$a * $b / $c = $r OK" > + else > + echo "$a * $b / $c = $r is wrong" >&2 > + echo "should be equivalent to 0x$expected" >&2 > + exit 1 > + fi > + expected=$( printf "obase=16; ibase=16; (%X * %X + %X) / %X\n" $a $b $((c-1)) $c | bc ) > + given=$( printf "%X\n" $((r + d)) ) > + if [ "$expected" = "$given" ]; then > + echo "$a * $b +/ $c = $(printf '%#x' $((r + d))) OK" > + else > + echo "$a * $b +/ $c = $(printf '%#x' $((r + d))) is wrong" >&2 > + echo "should be equivalent to 0x$expected" >&2 > + exit 1 > + fi > +done > + > */ > > static int __init test_init(void) > { > + int errors = 0; > + int tests = 0; > int i; > > pr_info("Starting mul_u64_u64_div_u64() test\n"); > @@ -75,16 +87,28 @@ static int __init test_init(void) > u64 c = test_values[i].c; > u64 expected_result = test_values[i].result; > u64 result = mul_u64_u64_div_u64(a, b, c); > + u64 result_up = mul_u64_u64_div_u64_roundup(a, b, c); > + > + tests += 2; > > if (result != expected_result) { > pr_err("ERROR: 0x%016llx * 0x%016llx / 0x%016llx\n", a, b, c); > pr_err("ERROR: expected result: %016llx\n", expected_result); > pr_err("ERROR: obtained result: %016llx\n", result); > + errors++; > + } > + expected_result += test_values[i].round_up; > + if (result_up != expected_result) { > + pr_err("ERROR: 0x%016llx * 0x%016llx +/ 0x%016llx\n", a, b, c); > + pr_err("ERROR: expected result: %016llx\n", expected_result); > + pr_err("ERROR: obtained result: %016llx\n", result_up); > + errors++; > } > } > > - pr_info("Completed mul_u64_u64_div_u64() test\n"); > - return 0; > + pr_info("Completed mul_u64_u64_div_u64() test, %d tests, %d errors\n", > + tests, errors); > + return errors ? -EINVAL : 0; > } > > static void __exit test_exit(void) > -- > 2.39.5 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-05-25 11:38 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-18 13:38 [PATCH v2 next 0/4] Implement mul_u64_u64_div_u64_roundup() David Laight 2025-05-18 13:38 ` [PATCH v2 next 1/4] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' David Laight 2025-05-20 2:11 ` Nicolas Pitre 2025-05-18 13:38 ` [PATCH v2 next 2/4] lib: mul_u64_u64_div_u64() Use BUG_ON() for divide by zero David Laight 2025-05-18 15:42 ` kernel test robot 2025-05-18 21:50 ` David Laight 2025-05-19 6:10 ` Uwe Kleine-König 2025-05-19 11:59 ` David Laight 2025-05-20 1:54 ` Nicolas Pitre 2025-05-20 2:21 ` Nicolas Pitre 2025-05-20 21:43 ` David Laight 2025-05-20 22:28 ` Nicolas Pitre 2025-05-18 13:38 ` [PATCH v2 next 3/4] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() David Laight 2025-05-20 3:03 ` Nicolas Pitre 2025-05-20 21:37 ` David Laight 2025-05-20 22:24 ` Nicolas Pitre 2025-05-21 12:52 ` David Laight 2025-05-21 13:50 ` Nicolas Pitre 2025-05-25 11:38 ` David Laight 2025-05-18 13:38 ` [PATCH v2 next 4/4] lib: Add tests for mul_u64_u64_div_u64_roundup() David Laight 2025-05-20 3:07 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).