* [RFC PATCH] lib: test_bitops: add compile-time optimization/evaluations assertions
@ 2022-11-11 8:13 Vincent Mailhol
2023-11-30 10:27 ` [PATCH v2] " Vincent Mailhol
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Vincent Mailhol @ 2022-11-11 8:13 UTC (permalink / raw)
To: Yury Norov, linux-kernel
Cc: Vincent Mailhol, Andrew Morton, Nathan Chancellor,
Peter Zijlstra (Intel), David Gow, Kees Cook, Josh Poimboeuf,
Dan Williams, Miguel Ojeda, Isabella Basso, Vlastimil Babka,
Rasmus Villemoes, Nick Desaulniers
Add a function to the bitops test suite to assert that the bitops
helper correctly fold constant expressions (or trigger a build bug
otherwise). This should work on all the optimization levels supported
by Kbuild.
The function doesn't perform any runtime tests and gets optimized out to
nothing after passing the build assertions.
At the time of writing, we have not yet asserted that this will work on all
architectures. Architectures which fail that test should adjust their
arch/*/include/asm/bitops.h by either:
- using __builtin_constant_p() to select between the architecture
specific asm and the generic __builitn implementation if the
architecture specific asm produces better code (similar to [1]).
- always calling the generic __builtin implementation if the architecture
specific asm does not produce better code (similar to [2]).
[1] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate
constant expressions")
[2] patch "x86/asm/bitops: Replace __fls() by its generic builtin
implementation"
Link: https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/
Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
This is a follow up of this thread:
https://lore.kernel.org/all/Yw8hJS9f6SofG4%2F6@yury-laptop/
in which I promised to eventually implement a check.
This patch requires below series in order to work on x86:
https://lore.kernel.org/lkml/20221106095106.849154-1-mailhol.vincent@wanadoo.fr/
Because that series is not yet merge, I am sending it as RFC for
now.
Concerning architectures other than x86, I am fine to write patches
but I will not test it (I do not have the build environment).
One idea would be to add this patch to any kind of CI which runs on
all architecture and see which architectures need to be fixed before
making this mainstream. Tell me what you think.
---
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3fc7abffc7aa..233a82cd3b6e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2300,6 +2300,10 @@ config TEST_BITOPS
compilations. It has no dependencies and doesn't run or load unless
explicitly requested by name. for example: modprobe test_bitops.
+ In addition, check that the compiler is able to fold the bitops
+ function into a compile-time constant (given that the argument is also
+ a compile-time constant) and trigger a build bug otherwise.
+
If unsure, say N.
config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..e6e3d22ce52a 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,50 @@ static unsigned long order_comb_long[][2] = {
};
#endif
+static void __init test_bitops_const_eval(void)
+{
+ int res;
+ u64 res64;
+
+ /*
+ * On any supported optimization level (-O2, -Os) and if
+ * invoked with a compile-time constant argument, the compiler
+ * must be able to fold into a constant expression all the bit
+ * find functions. Namely: __ffs(), ffs(), ffz(), __fls(),
+ * fls() and fls64(). Otherwise, trigger a build bug.
+ */
+
+ /* __ffs(BIT(10)) == 10 */
+ res = __ffs(BIT(10)) == 10;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* ffs(BIT(10)) == 11 */
+ res = ffs(BIT(10)) == 11;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* ffz(~BIT(10)) == 10 */
+ res = ffz(~BIT(10)) == 10;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* __fls(BIT(10)) == 10 */
+ res = __fls(BIT(10)) == 10;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* fls(BIT(10)) == 11 */
+ res = fls(BIT(10)) == 11;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* fls64(BIT_ULL(10)) == 11 */
+ res64 = fls64(BIT_ULL(10)) == 11;
+ BUILD_BUG_ON(!__builtin_constant_p(res64));
+ BUILD_BUG_ON(!res64);
+}
+
static int __init test_bitops_startup(void)
{
int i, bit_set;
@@ -94,6 +138,8 @@ static int __init test_bitops_startup(void)
if (bit_set != BITOPS_LAST)
pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
+ test_bitops_const_eval();
+
pr_info("Completed bitops test\n");
return 0;
--
2.35.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions 2022-11-11 8:13 [RFC PATCH] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol @ 2023-11-30 10:27 ` Vincent Mailhol 2023-11-30 16:25 ` kernel test robot 2023-12-06 13:27 ` kernel test robot 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol 2 siblings, 2 replies; 32+ messages in thread From: Vincent Mailhol @ 2023-11-30 10:27 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Vincent Mailhol, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver Add a function in the bitops test suite to assert that the bitops helper correctly fold constant expressions (or trigger a build bug otherwise). This should work on all the optimization levels supported by Kbuild. The function doesn't perform any runtime tests and gets optimized out to nothing after passing the build assertions. Architectures which fail that test should adjust their arch/*/include/asm/bitops.h in order to use the compiler's generic __builitn implementation if the argument is a constant expression (similar to [1]). [1] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions") Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- This is tested on x86, x86_64, arm and arm64. Other architectures were not tested. My idea would be to add this patch to any kind of CI which runs on all architecture (linux-next?) and see if anything breaks. Or maybe I should send a message to the maintainers of all of the arch/*/include/asm/bitops.h files? Tell me what you think. ** Changelog ** v1 -> v2: - Drop the RFC patch. v1 was not ready to be applied on x86 because of pending changes in arch/x86/include/asm/bitops.h. This was finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions"). Thanks Nick! - Update the commit description. - Introduce the test_const_eval() macro to factorize code. - No functional changes. Link: https://lore.kernel.org/all/20221111081316.30373-1-mailhol.vincent@wanadoo.fr/ --- lib/Kconfig.debug | 4 ++++ lib/test_bitops.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cc7d53d9dc01..c97d818dbc30 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2454,6 +2454,10 @@ config TEST_BITOPS compilations. It has no dependencies and doesn't run or load unless explicitly requested by name. for example: modprobe test_bitops. + In addition, check that the compiler is able to fold the bitops + function into a compile-time constant (given that the argument is also + a compile-time constant) and trigger a build bug otherwise. + If unsure, say N. config TEST_VMALLOC diff --git a/lib/test_bitops.c b/lib/test_bitops.c index 3b7bcbee84db..49e2f76575e4 100644 --- a/lib/test_bitops.c +++ b/lib/test_bitops.c @@ -50,6 +50,35 @@ static unsigned long order_comb_long[][2] = { }; #endif +/* Assert that a boolean expression can be folded in a constant and is true. */ +#define test_const_eval(test_expr) \ +({ \ + /* Evaluate once so that compiler can fold it. */ \ + bool __test_expr = test_expr; \ + \ + BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \ + BUILD_BUG_ON(!__test_expr); \ +}) + +static void test_bitops_const_eval(void) +{ + /* + * On any supported optimization level (-O2, -Os) and if + * invoked with a compile-time constant argument, the compiler + * must be able to fold into a constant expression all the bit + * find functions. Namely: __ffs(), ffs(), ffz(), __fls(), + * fls() and fls64(). Otherwise, trigger a build bug. + */ + const int n = 10; + + test_const_eval(__ffs(BIT(n)) == n); + test_const_eval(ffs(BIT(n)) == n + 1); + test_const_eval(ffz(~BIT(n)) == n); + test_const_eval(__fls(BIT(n)) == n); + test_const_eval(fls(BIT(n)) == n + 1); + test_const_eval(fls64(BIT_ULL(n)) == n + 1); +} + static int __init test_bitops_startup(void) { int i, bit_set; @@ -94,6 +123,8 @@ static int __init test_bitops_startup(void) if (bit_set != BITOPS_LAST) pr_err("ERROR: FOUND SET BIT %d\n", bit_set); + test_bitops_const_eval(); + pr_info("Completed bitops test\n"); return 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions 2023-11-30 10:27 ` [PATCH v2] " Vincent Mailhol @ 2023-11-30 16:25 ` kernel test robot 2023-12-06 13:27 ` kernel test robot 1 sibling, 0 replies; 32+ messages in thread From: kernel test robot @ 2023-11-30 16:25 UTC (permalink / raw) To: Vincent Mailhol, Andrew Morton, linux-kernel, Yury Norov Cc: oe-kbuild-all, Linux Memory Management List, Vincent Mailhol, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver Hi Vincent, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.7-rc3 next-20231130] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Mailhol/lib-test_bitops-add-compile-time-optimization-evaluations-assertions/20231130-182837 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20231130102717.1297492-1-mailhol.vincent%40wanadoo.fr patch subject: [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20231201/202312010058.JJKeeqvE-lkp@intel.com/config) compiler: m68k-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231201/202312010058.JJKeeqvE-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202312010058.JJKeeqvE-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from <command-line>: In function 'test_bitops_const_eval', inlined from 'test_bitops_startup' at lib/test_bitops.c:126:2: >> include/linux/compiler_types.h:435:45: error: call to '__compiletime_assert_183' declared with attribute error: BUILD_BUG_ON failed: !__builtin_constant_p(__test_expr) 435 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:416:25: note: in definition of macro '__compiletime_assert' 416 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:435:9: note: in expansion of macro '_compiletime_assert' 435 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG' 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) | ^~~~~~~~~~~~~~~~ lib/test_bitops.c:59:9: note: in expansion of macro 'BUILD_BUG_ON' 59 | BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \ | ^~~~~~~~~~~~ lib/test_bitops.c:74:9: note: in expansion of macro 'test_const_eval' 74 | test_const_eval(__ffs(BIT(n)) == n); | ^~~~~~~~~~~~~~~ vim +/__compiletime_assert_183 +435 include/linux/compiler_types.h eb5c2d4b45e3d2 Will Deacon 2020-07-21 421 eb5c2d4b45e3d2 Will Deacon 2020-07-21 422 #define _compiletime_assert(condition, msg, prefix, suffix) \ eb5c2d4b45e3d2 Will Deacon 2020-07-21 423 __compiletime_assert(condition, msg, prefix, suffix) eb5c2d4b45e3d2 Will Deacon 2020-07-21 424 eb5c2d4b45e3d2 Will Deacon 2020-07-21 425 /** eb5c2d4b45e3d2 Will Deacon 2020-07-21 426 * compiletime_assert - break build and emit msg if condition is false eb5c2d4b45e3d2 Will Deacon 2020-07-21 427 * @condition: a compile-time constant condition to check eb5c2d4b45e3d2 Will Deacon 2020-07-21 428 * @msg: a message to emit if condition is false eb5c2d4b45e3d2 Will Deacon 2020-07-21 429 * eb5c2d4b45e3d2 Will Deacon 2020-07-21 430 * In tradition of POSIX assert, this macro will break the build if the eb5c2d4b45e3d2 Will Deacon 2020-07-21 431 * supplied condition is *false*, emitting the supplied error message if the eb5c2d4b45e3d2 Will Deacon 2020-07-21 432 * compiler has support to do so. eb5c2d4b45e3d2 Will Deacon 2020-07-21 433 */ eb5c2d4b45e3d2 Will Deacon 2020-07-21 434 #define compiletime_assert(condition, msg) \ eb5c2d4b45e3d2 Will Deacon 2020-07-21 @435 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) eb5c2d4b45e3d2 Will Deacon 2020-07-21 436 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions 2023-11-30 10:27 ` [PATCH v2] " Vincent Mailhol 2023-11-30 16:25 ` kernel test robot @ 2023-12-06 13:27 ` kernel test robot 1 sibling, 0 replies; 32+ messages in thread From: kernel test robot @ 2023-12-06 13:27 UTC (permalink / raw) To: Vincent Mailhol, Andrew Morton, linux-kernel, Yury Norov Cc: llvm, oe-kbuild-all, Linux Memory Management List, Vincent Mailhol, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver Hi Vincent, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.7-rc4 next-20231206] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Mailhol/lib-test_bitops-add-compile-time-optimization-evaluations-assertions/20231130-182837 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20231130102717.1297492-1-mailhol.vincent%40wanadoo.fr patch subject: [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20231206/202312062158.60c1e6yt-lkp@intel.com/config) compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231206/202312062158.60c1e6yt-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202312062158.60c1e6yt-lkp@intel.com/ All errors (new ones prefixed by >>): >> lib/test_bitops.c:74:2: error: call to '__compiletime_assert_178' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(__test_expr) 74 | test_const_eval(__ffs(BIT(n)) == n); | ^ lib/test_bitops.c:59:2: note: expanded from macro 'test_const_eval' 59 | BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \ | ^ include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON' 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) | ^ include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^ note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) include/linux/compiler_types.h:423:2: note: expanded from macro '_compiletime_assert' 423 | __compiletime_assert(condition, msg, prefix, suffix) | ^ include/linux/compiler_types.h:416:4: note: expanded from macro '__compiletime_assert' 416 | prefix ## suffix(); \ | ^ <scratch space>:18:1: note: expanded from here 18 | __compiletime_assert_178 | ^ 1 error generated. vim +74 lib/test_bitops.c 62 63 static void test_bitops_const_eval(void) 64 { 65 /* 66 * On any supported optimization level (-O2, -Os) and if 67 * invoked with a compile-time constant argument, the compiler 68 * must be able to fold into a constant expression all the bit 69 * find functions. Namely: __ffs(), ffs(), ffz(), __fls(), 70 * fls() and fls64(). Otherwise, trigger a build bug. 71 */ 72 const int n = 10; 73 > 74 test_const_eval(__ffs(BIT(n)) == n); 75 test_const_eval(ffs(BIT(n)) == n + 1); 76 test_const_eval(ffz(~BIT(n)) == n); 77 test_const_eval(__fls(BIT(n)) == n); 78 test_const_eval(fls(BIT(n)) == n + 1); 79 test_const_eval(fls64(BIT_ULL(n)) == n + 1); 80 } 81 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 0/5] bitops: optimize code and add tests 2022-11-11 8:13 [RFC PATCH] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol 2023-11-30 10:27 ` [PATCH v2] " Vincent Mailhol @ 2023-12-17 7:12 ` Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol ` (4 more replies) 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol 2 siblings, 5 replies; 32+ messages in thread From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol This series make sure that all the bitops operations (namely __ffs(), ffs(), ffz(), __fls(), fls(), fls64()) correctly fold constant expressions given that their argument is also a constant expression. The first two patches optimize m68k architecture, the third and fourth optimize the hexagon architecture bitops function, the fifth and final patch adds test to assert that the constant folding occurs and that the result is accurate. This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other architectures, I am putting my trust into the kernel test robot to send a report if ever one of the other architectures still lacks bitops optimizations. --- ** Changelog ** v2 -> v3: - Add patches 1/5 and 2/5 to optimize m68k architecture bitops. Thanks to the kernel test robot for reporting! - Add patches 3/5 and 4/5 to optimize hexagon architecture bitops. Thanks to the kernel test robot for reporting! - Patch 5/5: mark test_bitops_const_eval() as __always_inline, this done, pass n (the test number) as a parameter. Previously, only BITS(10) was tested. Add tests for BITS(0) and BITS(31). Link: https://lore.kernel.org/all/20231130102717.1297492-1-mailhol.vincent@wanadoo.fr/ v1 -> v2: - Drop the RFC patch. v1 was not ready to be applied on x86 because of pending changes in arch/x86/include/asm/bitops.h. This was finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions"). Thanks Nick! - Update the commit description. - Introduce the test_const_eval() macro to factorize code. - No functional change. Link: https://lore.kernel.org/all/20221111081316.30373-1-mailhol.vincent@wanadoo.fr/ Vincent Mailhol (5): m68k/bitops: force inlining of all bitops functions m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions hexagon/bitops: force inlining of all bitops functions hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions lib: test_bitops: add compile-time optimization/evaluations assertions arch/hexagon/include/asm/bitops.h | 37 ++++++++---- arch/m68k/include/asm/bitops.h | 99 +++++++++++++++++-------------- lib/Kconfig.debug | 4 ++ lib/test_bitops.c | 32 ++++++++++ 4 files changed, 118 insertions(+), 54 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol @ 2023-12-17 7:12 ` Vincent Mailhol 2024-01-02 10:28 ` Geert Uytterhoeven 2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol ` (3 subsequent siblings) 4 siblings, 1 reply; 32+ messages in thread From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol The inline keyword actually does not guarantee that the compiler will inline a functions. Whenever the goal is to actually inline a function, __always_inline should always be preferred instead. On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB. $ size --format=GNU vmlinux.before vmlinux.after text data bss total filename 60449738 70975612 2288988 133714338 vmlinux.before 60446534 70972412 2289596 133708542 vmlinux.after Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends") Link: https://git.kernel.org/torvalds/c/8dd5032d9c54 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/m68k/include/asm/bitops.h | 87 +++++++++++++++++----------------- 1 file changed, 44 insertions(+), 43 deletions(-) diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index 14c64a6f1217..ae0457d582b8 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -28,7 +28,7 @@ * So we use the best form possible on a given platform. */ -static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; @@ -38,7 +38,7 @@ static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr) : "memory"); } -static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; @@ -47,7 +47,7 @@ static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr) : "di" (nr & 7)); } -static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr) { __asm__ __volatile__ ("bfset %1{%0:#1}" : @@ -71,7 +71,7 @@ arch___set_bit(unsigned long nr, volatile unsigned long *addr) set_bit(nr, addr); } -static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; @@ -81,7 +81,7 @@ static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr) : "memory"); } -static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; @@ -90,7 +90,7 @@ static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr) : "di" (nr & 7)); } -static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr) { __asm__ __volatile__ ("bfclr %1{%0:#1}" : @@ -114,7 +114,7 @@ arch___clear_bit(unsigned long nr, volatile unsigned long *addr) clear_bit(nr, addr); } -static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; @@ -124,7 +124,7 @@ static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr) : "memory"); } -static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; @@ -133,7 +133,7 @@ static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr) : "di" (nr & 7)); } -static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr) +static __always_inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr) { __asm__ __volatile__ ("bfchg %1{%0:#1}" : @@ -160,8 +160,8 @@ arch___change_bit(unsigned long nr, volatile unsigned long *addr) #define arch_test_bit generic_test_bit #define arch_test_bit_acquire generic_test_bit_acquire -static inline int bset_reg_test_and_set_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bset_reg_test_and_set_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; char retval; @@ -173,8 +173,8 @@ static inline int bset_reg_test_and_set_bit(int nr, return retval; } -static inline int bset_mem_test_and_set_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; char retval; @@ -185,8 +185,8 @@ static inline int bset_mem_test_and_set_bit(int nr, return retval; } -static inline int bfset_mem_test_and_set_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bfset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr) { char retval; @@ -213,8 +213,8 @@ arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr) return test_and_set_bit(nr, addr); } -static inline int bclr_reg_test_and_clear_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bclr_reg_test_and_clear_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; char retval; @@ -226,8 +226,8 @@ static inline int bclr_reg_test_and_clear_bit(int nr, return retval; } -static inline int bclr_mem_test_and_clear_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; char retval; @@ -238,8 +238,8 @@ static inline int bclr_mem_test_and_clear_bit(int nr, return retval; } -static inline int bfclr_mem_test_and_clear_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bfclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr) { char retval; @@ -266,8 +266,8 @@ arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr) return test_and_clear_bit(nr, addr); } -static inline int bchg_reg_test_and_change_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bchg_reg_test_and_change_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; char retval; @@ -279,8 +279,8 @@ static inline int bchg_reg_test_and_change_bit(int nr, return retval; } -static inline int bchg_mem_test_and_change_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr) { char *p = (char *)vaddr + (nr ^ 31) / 8; char retval; @@ -291,8 +291,8 @@ static inline int bchg_mem_test_and_change_bit(int nr, return retval; } -static inline int bfchg_mem_test_and_change_bit(int nr, - volatile unsigned long *vaddr) +static __always_inline int +bfchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr) { char retval; @@ -319,8 +319,8 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) return test_and_change_bit(nr, addr); } -static inline bool xor_unlock_is_negative_byte(unsigned long mask, - volatile unsigned long *p) +static __always_inline bool +xor_unlock_is_negative_byte(unsigned long mask, volatile unsigned long *p) { #ifdef CONFIG_COLDFIRE __asm__ __volatile__ ("eorl %1, %0" @@ -350,8 +350,8 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask, #include <asm-generic/bitops/ffz.h> #else -static inline int find_first_zero_bit(const unsigned long *vaddr, - unsigned size) +static __always_inline int +find_first_zero_bit(const unsigned long *vaddr, unsigned size) { const unsigned long *p = vaddr; int res = 32; @@ -376,8 +376,8 @@ static inline int find_first_zero_bit(const unsigned long *vaddr, } #define find_first_zero_bit find_first_zero_bit -static inline int find_next_zero_bit(const unsigned long *vaddr, int size, - int offset) +static __always_inline int +find_next_zero_bit(const unsigned long *vaddr, int size, int offset) { const unsigned long *p = vaddr + (offset >> 5); int bit = offset & 31UL, res; @@ -406,7 +406,8 @@ static inline int find_next_zero_bit(const unsigned long *vaddr, int size, } #define find_next_zero_bit find_next_zero_bit -static inline int find_first_bit(const unsigned long *vaddr, unsigned size) +static __always_inline int +find_first_bit(const unsigned long *vaddr, unsigned size) { const unsigned long *p = vaddr; int res = 32; @@ -431,8 +432,8 @@ static inline int find_first_bit(const unsigned long *vaddr, unsigned size) } #define find_first_bit find_first_bit -static inline int find_next_bit(const unsigned long *vaddr, int size, - int offset) +static __always_inline int +find_next_bit(const unsigned long *vaddr, int size, int offset) { const unsigned long *p = vaddr + (offset >> 5); int bit = offset & 31UL, res; @@ -465,7 +466,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size, * ffz = Find First Zero in word. Undefined if no zero exists, * so code should check against ~0UL first.. */ -static inline unsigned long ffz(unsigned long word) +static __always_inline unsigned long ffz(unsigned long word) { int res; @@ -488,7 +489,7 @@ static inline unsigned long ffz(unsigned long word) */ #if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \ !defined(CONFIG_M68000) -static inline unsigned long __ffs(unsigned long x) +static __always_inline unsigned long __ffs(unsigned long x) { __asm__ __volatile__ ("bitrev %0; ff1 %0" : "=d" (x) @@ -496,7 +497,7 @@ static inline unsigned long __ffs(unsigned long x) return x; } -static inline int ffs(int x) +static __always_inline int ffs(int x) { if (!x) return 0; @@ -518,7 +519,7 @@ static inline int ffs(int x) * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ -static inline int ffs(int x) +static __always_inline int ffs(int x) { int cnt; @@ -528,7 +529,7 @@ static inline int ffs(int x) return 32 - cnt; } -static inline unsigned long __ffs(unsigned long x) +static __always_inline unsigned long __ffs(unsigned long x) { return ffs(x) - 1; } @@ -536,7 +537,7 @@ static inline unsigned long __ffs(unsigned long x) /* * fls: find last bit set. */ -static inline int fls(unsigned int x) +static __always_inline int fls(unsigned int x) { int cnt; @@ -546,7 +547,7 @@ static inline int fls(unsigned int x) return 32 - cnt; } -static inline unsigned long __fls(unsigned long x) +static __always_inline unsigned long __fls(unsigned long x) { return fls(x) - 1; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions 2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol @ 2024-01-02 10:28 ` Geert Uytterhoeven 2024-01-07 12:01 ` Vincent MAILHOL 0 siblings, 1 reply; 32+ messages in thread From: Geert Uytterhoeven @ 2024-01-02 10:28 UTC (permalink / raw) To: Vincent Mailhol Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k Hi Vincent, Thanks for your patch! On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol <mailhol.vincent@wanadoo.fr> wrote: > The inline keyword actually does not guarantee that the compiler will > inline a functions. Whenever the goal is to actually inline a > function, __always_inline should always be preferred instead. > > On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB. > > $ size --format=GNU vmlinux.before vmlinux.after > text data bss total filename > 60449738 70975612 2288988 133714338 vmlinux.before > 60446534 70972412 2289596 133708542 vmlinux.after With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different (i.e. a size increase): allyesconfig: text data bss total filename 58878600 72415994 2283652 133578246 vmlinux.before 58882250 72419706 2284004 133585960 vmlinux.after atari_defconfig: text data bss total filename 4112060 1579862 151680 5843602 vmlinux-v6.7-rc8 4117008 1579350 151680 5848038 vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining The next patch offsets that for allyesconfig, but not for atari_defconfig. > Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of > test_and_set_bit and friends") Please don't split lines containing tags. > Link: https://git.kernel.org/torvalds/c/8dd5032d9c54 > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions 2024-01-02 10:28 ` Geert Uytterhoeven @ 2024-01-07 12:01 ` Vincent MAILHOL 0 siblings, 0 replies; 32+ messages in thread From: Vincent MAILHOL @ 2024-01-07 12:01 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k On Tue. 2 janv. 2024 at 19:28, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Vincent, > > Thanks for your patch! Thanks for the review and for running the benchmark. > On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol > <mailhol.vincent@wanadoo.fr> wrote: > > The inline keyword actually does not guarantee that the compiler will > > inline a functions. Whenever the goal is to actually inline a > > function, __always_inline should always be preferred instead. > > > > On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB. > > > > $ size --format=GNU vmlinux.before vmlinux.after > > text data bss total filename > > 60449738 70975612 2288988 133714338 vmlinux.before > > 60446534 70972412 2289596 133708542 vmlinux.after > > With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different > (i.e. a size increase): Those results are not normal, there should not be such a big discrepancy between two versions of the same compiler. I double checked everything and found out that I made a mistake when computing the figures: not sure what exactly, but at some point, the ASLR seeds (or other similar randomization feature) got reset and so, the decrease I witnessed was just a "lucky roll". After rerunning the benchmark (making sure to keep every seeds), I got similar results as you: text data bss total filename 60449738 70975356 2288988 133714082 vmlinux_allyesconfig.before_this_series 60446534 70979068 2289596 133715198 vmlinux_allyesconfig.after_first_patch 60429746 70979132 2291676 133700554 vmlinux_allyesconfig.final_second_patch Note that there are still some kind of randomness on the data segment as shown in those other benchmarks I run: text data bss total filename 60449738 70976124 2288988 133714850 vmlinux_allyesconfig.before_this_series 60446534 70980092 2289596 133716222 vmlinux_allyesconfig.after_first_patch 60429746 70979388 2291676 133700810 vmlinux_allyesconfig.after_second_patch text data bss total filename 60449738 70975612 2288988 133714338 vmlinux_allyesconfig.before_this_series 60446534 70980348 2289596 133716478 vmlinux_allyesconfig.after_first_patch 60429746 70979900 2291676 133701322 vmlinux_allyesconfig.after_second_patch But the error margin is within 1K. So, in short, I inlined some functions which I shouldn't have. I am preparing a v4 in which I will only inline the bit-find functions (namely: __ffs(), ffs(), ffz(), __fls(), fls() and fls64()). Here are the new figures: text data bss total filename 60453552 70955485 2288620 133697657 vmlinux_allyesconfig.before_this_series 60450304 70953085 2289260 133692649 vmlinux_allyesconfig.after_first_patch 60433536 70952637 2291340 133677513 vmlinux_allyesconfig.after_second_patch N.B. The new figures were after a rebase, so do not try to compare with the previous benchmarks. I will send the v4 soon, after I finish to update the patch comments and double check things. Concerning the other functions in bitops.h, there may be some other ones worth a __always_inline. But I will narrow the scope of this series only to the bit-find function. If a good samaritan wants to investigate the other functions, go ahead! Yours sincerely, Vincent Mailhol > allyesconfig: > > text data bss total filename > 58878600 72415994 2283652 133578246 vmlinux.before > 58882250 72419706 2284004 133585960 vmlinux.after > > atari_defconfig: > > text data bss total filename > 4112060 1579862 151680 5843602 vmlinux-v6.7-rc8 > 4117008 1579350 151680 5848038 > vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining > > The next patch offsets that for allyesconfig, but not for atari_defconfig. > > > Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of > > test_and_set_bit and friends") > > Please don't split lines containing tags. > > > Link: https://git.kernel.org/torvalds/c/8dd5032d9c54 > > > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol @ 2023-12-17 7:12 ` Vincent Mailhol 2024-01-02 10:49 ` Geert Uytterhoeven 2023-12-17 7:12 ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol ` (2 subsequent siblings) 4 siblings, 1 reply; 32+ messages in thread From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol The compiler is not able to do constant folding on "asm volatile" code. Evaluate whether or not the function argument is a constant expression and if this is the case, return an equivalent builtin expression. On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB. $ size --format=GNU vmlinux.before vmlinux.after text data bss total filename 60446534 70972412 2289596 133708542 vmlinux.before 60429746 70978876 2291676 133700298 vmlinux.after Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions") Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/m68k/include/asm/bitops.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index ae0457d582b8..3f89b9dccc33 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -470,6 +470,9 @@ static __always_inline unsigned long ffz(unsigned long word) { int res; + if (__builtin_constant_p(word)) + return __builtin_ctzl(~word); + __asm__ __volatile__ ("bfffo %1{#0,#0},%0" : "=d" (res) : "d" (~word & -~word)); return res ^ 31; @@ -491,6 +494,9 @@ static __always_inline unsigned long ffz(unsigned long word) !defined(CONFIG_M68000) static __always_inline unsigned long __ffs(unsigned long x) { + if (__builtin_constant_p(x)) + return __builtin_ctzl(x); + __asm__ __volatile__ ("bitrev %0; ff1 %0" : "=d" (x) : "0" (x)); @@ -523,6 +529,9 @@ static __always_inline int ffs(int x) { int cnt; + if (__builtin_constant_p(x)) + return __builtin_ffs(x); + __asm__ ("bfffo %1{#0:#0},%0" : "=d" (cnt) : "dm" (x & -x)); @@ -541,6 +550,9 @@ static __always_inline int fls(unsigned int x) { int cnt; + if (__builtin_constant_p(x)) + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0; + __asm__ ("bfffo %1{#0,#0},%0" : "=d" (cnt) : "dm" (x)); -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol @ 2024-01-02 10:49 ` Geert Uytterhoeven 0 siblings, 0 replies; 32+ messages in thread From: Geert Uytterhoeven @ 2024-01-02 10:49 UTC (permalink / raw) To: Vincent Mailhol Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol <mailhol.vincent@wanadoo.fr> wrote: > > The compiler is not able to do constant folding on "asm volatile" code. > > Evaluate whether or not the function argument is a constant expression > and if this is the case, return an equivalent builtin expression. > > On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB. > > $ size --format=GNU vmlinux.before vmlinux.after > text data bss total filename > 60446534 70972412 2289596 133708542 vmlinux.before > 60429746 70978876 2291676 133700298 vmlinux.after Still a win with gcc 9.5.0-1ubuntu1~22.04: allyesconfig: text data bss total filename 58882250 72419706 2284004 133585960 vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining 58863570 72419546 2285860 133568976 vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs So an even bigger win... atari_defconfig: text data bss total filename 4117008 1579350 151680 5848038 vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining 4116940 1579534 151712 5848186 vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs ... but a small size increase here. > > Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() > to evaluate constant expressions") Please don't split lines containing tags. > Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol @ 2023-12-17 7:12 ` Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol The inline keyword actually does not guarantee that the compiler will inline a functions. Whenever the goal is to actually inline a function, __always_inline should always be preferred instead. Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends") Link: https://git.kernel.org/torvalds/c/8dd5032d9c54 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/hexagon/include/asm/bitops.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h index 160d8f37fa1a..950d4acc2edc 100644 --- a/arch/hexagon/include/asm/bitops.h +++ b/arch/hexagon/include/asm/bitops.h @@ -28,7 +28,7 @@ * @nr: bit number to clear * @addr: pointer to memory */ -static inline int test_and_clear_bit(int nr, volatile void *addr) +static __always_inline int test_and_clear_bit(int nr, volatile void *addr) { int oldval; @@ -52,7 +52,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr) * @nr: bit number to set * @addr: pointer to memory */ -static inline int test_and_set_bit(int nr, volatile void *addr) +static __always_inline int test_and_set_bit(int nr, volatile void *addr) { int oldval; @@ -78,7 +78,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr) * @nr: bit number to set * @addr: pointer to memory */ -static inline int test_and_change_bit(int nr, volatile void *addr) +static __always_inline int test_and_change_bit(int nr, volatile void *addr) { int oldval; @@ -103,17 +103,17 @@ static inline int test_and_change_bit(int nr, volatile void *addr) * Rewrite later to save a cycle or two. */ -static inline void clear_bit(int nr, volatile void *addr) +static __always_inline void clear_bit(int nr, volatile void *addr) { test_and_clear_bit(nr, addr); } -static inline void set_bit(int nr, volatile void *addr) +static __always_inline void set_bit(int nr, volatile void *addr) { test_and_set_bit(nr, addr); } -static inline void change_bit(int nr, volatile void *addr) +static __always_inline void change_bit(int nr, volatile void *addr) { test_and_change_bit(nr, addr); } @@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) * * Undefined if no zero exists, so code should check against ~0UL first. */ -static inline long ffz(int x) +static __always_inline long ffz(int x) { int r; @@ -217,7 +217,7 @@ static inline long ffz(int x) * This is defined the same way as ffs. * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. */ -static inline int fls(unsigned int x) +static __always_inline int fls(unsigned int x) { int r; @@ -238,7 +238,7 @@ static inline int fls(unsigned int x) * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ -static inline int ffs(int x) +static __always_inline int ffs(int x) { int r; @@ -260,7 +260,7 @@ static inline int ffs(int x) * bits_per_long assumed to be 32 * numbering starts at 0 I think (instead of 1 like ffs) */ -static inline unsigned long __ffs(unsigned long word) +static __always_inline unsigned long __ffs(unsigned long word) { int num; @@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word) * Undefined if no set bit exists, so code should check against 0 first. * bits_per_long assumed to be 32 */ -static inline unsigned long __fls(unsigned long word) +static __always_inline unsigned long __fls(unsigned long word) { int num; -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol ` (2 preceding siblings ...) 2023-12-17 7:12 ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol @ 2023-12-17 7:12 ` Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol The compiler is not able to do constant folding on "asm volatile" code. Evaluate whether or not the function argument is a constant expression and if this is the case, return an equivalent builtin expression. Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions") Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h index 950d4acc2edc..12c6ad1ea2ed 100644 --- a/arch/hexagon/include/asm/bitops.h +++ b/arch/hexagon/include/asm/bitops.h @@ -204,6 +204,9 @@ static __always_inline long ffz(int x) { int r; + if (__builtin_constant_p(x)) + return __builtin_ctzl(~x); + asm("%0 = ct1(%1);\n" : "=&r" (r) : "r" (x)); @@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x) { int r; + if (__builtin_constant_p(x)) + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0; + asm("{ %0 = cl0(%1);}\n" "%0 = sub(#32,%0);\n" : "=&r" (r) @@ -242,6 +248,9 @@ static __always_inline int ffs(int x) { int r; + if (__builtin_constant_p(x)) + return __builtin_ffs(x); + asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n" "{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n" : "=&r" (r) @@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word) { int num; + if (__builtin_constant_p(word)) + return __builtin_ctzl(word); + asm("%0 = ct0(%1);\n" : "=&r" (num) : "r" (word)); @@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word) { int num; + if (__builtin_constant_p(word)) + return BITS_PER_LONG - 1 - __builtin_clzl(word); + asm("%0 = cl0(%1);\n" "%0 = sub(#31,%0);\n" : "=&r" (num) -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol ` (3 preceding siblings ...) 2023-12-17 7:12 ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol @ 2023-12-17 7:12 ` Vincent Mailhol 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2023-12-17 7:12 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Yury Norov Cc: Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-hexagon, linux-m68k, Vincent Mailhol Add a function in the bitops test suite to assert that the bitops helper correctly fold constant expressions (or trigger a build bug otherwise). This should work on all the optimization levels supported by Kbuild. The added function doesn't perform any runtime tests and gets optimized out to nothing after passing the build assertions. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- lib/Kconfig.debug | 4 ++++ lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cc7d53d9dc01..c97d818dbc30 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2454,6 +2454,10 @@ config TEST_BITOPS compilations. It has no dependencies and doesn't run or load unless explicitly requested by name. for example: modprobe test_bitops. + In addition, check that the compiler is able to fold the bitops + function into a compile-time constant (given that the argument is also + a compile-time constant) and trigger a build bug otherwise. + If unsure, say N. config TEST_VMALLOC diff --git a/lib/test_bitops.c b/lib/test_bitops.c index 3b7bcbee84db..99b612515eb6 100644 --- a/lib/test_bitops.c +++ b/lib/test_bitops.c @@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = { }; #endif +/* Assert that a boolean expression can be folded in a constant and is true. */ +#define test_const_eval(test_expr) \ +({ \ + /* Evaluate once so that compiler can fold it. */ \ + bool __test_expr = test_expr; \ + \ + BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \ + BUILD_BUG_ON(!__test_expr); \ +}) + +/* + * On any supported optimization level (-O2, -Os) and if invoked with + * a compile-time constant argument, the compiler must be able to fold + * into a constant expression all the bit find functions. Namely: + * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise, + * trigger a build bug. + */ +static __always_inline void test_bitops_const_eval(unsigned int n) +{ + test_const_eval(__ffs(BIT(n)) == n); + test_const_eval(ffs(BIT(n)) == n + 1); + test_const_eval(ffz(~BIT(n)) == n); + test_const_eval(__fls(BIT(n)) == n); + test_const_eval(fls(BIT(n)) == n + 1); + test_const_eval(fls64(BIT_ULL(n)) == n + 1); + test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33); +} + static int __init test_bitops_startup(void) { int i, bit_set; @@ -94,6 +122,10 @@ static int __init test_bitops_startup(void) if (bit_set != BITOPS_LAST) pr_err("ERROR: FOUND SET BIT %d\n", bit_set); + test_bitops_const_eval(0); + test_bitops_const_eval(10); + test_bitops_const_eval(31); + pr_info("Completed bitops test\n"); return 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v4 0/5] bitops: optimize code and add tests 2022-11-11 8:13 [RFC PATCH] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol 2023-11-30 10:27 ` [PATCH v2] " Vincent Mailhol 2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol @ 2024-01-28 5:00 ` Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol ` (4 more replies) 2 siblings, 5 replies; 32+ messages in thread From: Vincent Mailhol @ 2024-01-28 5:00 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k, Vincent Mailhol This series adds a compile test to make sure that all the bitops operations (namely __ffs(), ffs(), ffz(), __fls(), fls(), fls64()) correctly fold constant expressions given that their argument is also a constant expression. The other functions from bitops.h are out of scope. So far, only the n68k and the hexagon architectures lack such optimization. To this extend, the first two patches optimize m68k architecture, the third and fourth optimize the hexagon architecture bitops function. The fifth and final patch adds the compile time tests to assert that the constant folding occurs and that the result is accurate. This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other architectures, I am putting my trust into the kernel test robot to send a report if ever one of these still lacks bitops optimizations. The kernel test robot did not complain on v3, giving me confidence that all architectures are now properly optimized. --- ** Changelog ** v3 -> v4: - Only apply the __always_inline to the bit-find functions, do not touch other functions from bitops.h. I discovered that the benchmark done in the v3 was incorrect (refer to the thread for details). The scope was thus narrowed down to the bit-find functions for which I could demonstrate the gain in the benchmark. - Add benchmark for hexagon (patch 3/5 and 4/5). Contrarily to the m68k benchmark which is with an allyesconfig, the hexagon benchmark uses a defconfig. The reason is just that the allyesconfig did not work on first try on my environment (even before applying this series), and I did not spent efforts to troubleshoot. - Add Geert review tag in patch 2/5. Despite also receiving the tag for patch 1/5, I did not apply due to new changes in that patch. - Do not split the lines containing tags. Link: https://lore.kernel.org/all/20231217071250.892867-1-mailhol.vincent@wanadoo.fr/ v2 -> v3: - Add patches 1/5 and 2/5 to optimize m68k architecture bitops. Thanks to the kernel test robot for reporting! - Add patches 3/5 and 4/5 to optimize hexagon architecture bitops. Thanks to the kernel test robot for reporting! - Patch 5/5: mark test_bitops_const_eval() as __always_inline, this done, pass n (the test number) as a parameter. Previously, only BITS(10) was tested. Add tests for BITS(0) and BITS(31). Link: https://lore.kernel.org/all/20231130102717.1297492-1-mailhol.vincent@wanadoo.fr/ v1 -> v2: - Drop the RFC patch. v1 was not ready to be applied on x86 because of pending changes in arch/x86/include/asm/bitops.h. This was finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions"). Thanks Nick! - Update the commit description. - Introduce the test_const_eval() macro to factorize code. - No functional change. Link: https://lore.kernel.org/all/20221111081316.30373-1-mailhol.vincent@wanadoo.fr/ Vincent Mailhol (5): m68k/bitops: force inlining of all bit-find functions m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions hexagon/bitops: force inlining of all bit-find functions hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions lib: test_bitops: add compile-time optimization/evaluations assertions arch/hexagon/include/asm/bitops.h | 25 +++++++++++++++++++----- arch/m68k/include/asm/bitops.h | 26 ++++++++++++++++++------- lib/Kconfig.debug | 4 ++++ lib/test_bitops.c | 32 +++++++++++++++++++++++++++++++ 4 files changed, 75 insertions(+), 12 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol @ 2024-01-28 5:00 ` Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol ` (3 subsequent siblings) 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2024-01-28 5:00 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k, Vincent Mailhol The inline keyword actually does not guarantee that the compiler will inline a functions. Whenever the goal is to actually inline a function, __always_inline should always be preferred instead. __always_inline is also needed for further optimizations which will come up in a follow-up patch. Inline all the bit-find function which have a custom m68k assembly implementation, namely: __ffs(), ffs(), ffz(), __fls(), fls(). On linux v6.7 allyesconfig with GCC 13.2.1, it does not impact the final size, meaning that, overall, those function were already inlined on modern GCCs: $ size --format=GNU vmlinux.before vmlinux.after text data bss total filename 60457956 70953665 2288644 133700265 vmlinux.before 60457964 70953697 2288644 133700305 vmlinux.after Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends") Link: https://git.kernel.org/torvalds/c/8dd5032d9c54 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/m68k/include/asm/bitops.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index 14c64a6f1217..a8b23f897f24 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -465,7 +465,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size, * ffz = Find First Zero in word. Undefined if no zero exists, * so code should check against ~0UL first.. */ -static inline unsigned long ffz(unsigned long word) +static __always_inline unsigned long ffz(unsigned long word) { int res; @@ -488,7 +488,7 @@ static inline unsigned long ffz(unsigned long word) */ #if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \ !defined(CONFIG_M68000) -static inline unsigned long __ffs(unsigned long x) +static __always_inline unsigned long __ffs(unsigned long x) { __asm__ __volatile__ ("bitrev %0; ff1 %0" : "=d" (x) @@ -496,7 +496,7 @@ static inline unsigned long __ffs(unsigned long x) return x; } -static inline int ffs(int x) +static __always_inline int ffs(int x) { if (!x) return 0; @@ -518,7 +518,7 @@ static inline int ffs(int x) * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ -static inline int ffs(int x) +static __always_inline int ffs(int x) { int cnt; @@ -528,7 +528,7 @@ static inline int ffs(int x) return 32 - cnt; } -static inline unsigned long __ffs(unsigned long x) +static __always_inline unsigned long __ffs(unsigned long x) { return ffs(x) - 1; } @@ -536,7 +536,7 @@ static inline unsigned long __ffs(unsigned long x) /* * fls: find last bit set. */ -static inline int fls(unsigned int x) +static __always_inline int fls(unsigned int x) { int cnt; @@ -546,7 +546,7 @@ static inline int fls(unsigned int x) return 32 - cnt; } -static inline unsigned long __fls(unsigned long x) +static __always_inline unsigned long __fls(unsigned long x) { return fls(x) - 1; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol @ 2024-01-28 5:00 ` Vincent Mailhol 2024-01-28 5:39 ` Finn Thain 2024-01-28 5:00 ` [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions Vincent Mailhol ` (2 subsequent siblings) 4 siblings, 1 reply; 32+ messages in thread From: Vincent Mailhol @ 2024-01-28 5:00 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k, Vincent Mailhol The compiler is not able to do constant folding on "asm volatile" code. Evaluate whether or not the function argument is a constant expression and if this is the case, return an equivalent builtin expression. On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB. $ size --format=GNU vmlinux.before vmlinux.after text data bss total filename 60457964 70953697 2288644 133700305 vmlinux.before 60441196 70957057 2290724 133688977 vmlinux.after Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions") Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/m68k/include/asm/bitops.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index a8b23f897f24..02ec8a193b96 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word) { int res; + if (__builtin_constant_p(word)) + return __builtin_ctzl(~word); + __asm__ __volatile__ ("bfffo %1{#0,#0},%0" : "=d" (res) : "d" (~word & -~word)); return res ^ 31; @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word) !defined(CONFIG_M68000) static __always_inline unsigned long __ffs(unsigned long x) { + if (__builtin_constant_p(x)) + return __builtin_ctzl(x); + __asm__ __volatile__ ("bitrev %0; ff1 %0" : "=d" (x) : "0" (x)); @@ -522,6 +528,9 @@ static __always_inline int ffs(int x) { int cnt; + if (__builtin_constant_p(x)) + return __builtin_ffs(x); + __asm__ ("bfffo %1{#0:#0},%0" : "=d" (cnt) : "dm" (x & -x)); @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x) { int cnt; + if (__builtin_constant_p(x)) + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0; + __asm__ ("bfffo %1{#0,#0},%0" : "=d" (cnt) : "dm" (x)); -- 2.43.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 5:00 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol @ 2024-01-28 5:39 ` Finn Thain 2024-01-28 6:26 ` Vincent MAILHOL 0 siblings, 1 reply; 32+ messages in thread From: Finn Thain @ 2024-01-28 5:39 UTC (permalink / raw) To: Vincent Mailhol Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k On Sun, 28 Jan 2024, Vincent Mailhol wrote: > The compiler is not able to do constant folding on "asm volatile" code. > > Evaluate whether or not the function argument is a constant expression > and if this is the case, return an equivalent builtin expression. > > On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB. > > $ size --format=GNU vmlinux.before vmlinux.after > text data bss total filename > 60457964 70953697 2288644 133700305 vmlinux.before > 60441196 70957057 2290724 133688977 vmlinux.after > > Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions") > Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 > > Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > --- > arch/m68k/include/asm/bitops.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h > index a8b23f897f24..02ec8a193b96 100644 > --- a/arch/m68k/include/asm/bitops.h > +++ b/arch/m68k/include/asm/bitops.h > @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word) > { > int res; > > + if (__builtin_constant_p(word)) > + return __builtin_ctzl(~word); > + > __asm__ __volatile__ ("bfffo %1{#0,#0},%0" > : "=d" (res) : "d" (~word & -~word)); > return res ^ 31; If the builtin has the desired behaviour, why do we reimplement it in asm? Shouldn't we abandon one or the other to avoid having to prove (and maintain) their equivalence? > @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word) > !defined(CONFIG_M68000) > static __always_inline unsigned long __ffs(unsigned long x) > { > + if (__builtin_constant_p(x)) > + return __builtin_ctzl(x); > + > __asm__ __volatile__ ("bitrev %0; ff1 %0" > : "=d" (x) > : "0" (x)); > @@ -522,6 +528,9 @@ static __always_inline int ffs(int x) > { > int cnt; > > + if (__builtin_constant_p(x)) > + return __builtin_ffs(x); > + > __asm__ ("bfffo %1{#0:#0},%0" > : "=d" (cnt) > : "dm" (x & -x)); > @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x) > { > int cnt; > > + if (__builtin_constant_p(x)) > + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0; > + > __asm__ ("bfffo %1{#0,#0},%0" > : "=d" (cnt) > : "dm" (x)); > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 5:39 ` Finn Thain @ 2024-01-28 6:26 ` Vincent MAILHOL 2024-01-28 12:16 ` David Laight 0 siblings, 1 reply; 32+ messages in thread From: Vincent MAILHOL @ 2024-01-28 6:26 UTC (permalink / raw) To: Finn Thain Cc: Andrew Morton, linux-kernel, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@linux-m68k.org> wrote: > On Sun, 28 Jan 2024, Vincent Mailhol wrote: > > > The compiler is not able to do constant folding on "asm volatile" code. > > > > Evaluate whether or not the function argument is a constant expression > > and if this is the case, return an equivalent builtin expression. > > > > On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB. > > > > $ size --format=GNU vmlinux.before vmlinux.after > > text data bss total filename > > 60457964 70953697 2288644 133700305 vmlinux.before > > 60441196 70957057 2290724 133688977 vmlinux.after > > > > Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions") > > Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 > > > > Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> > > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> > > --- > > arch/m68k/include/asm/bitops.h | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h > > index a8b23f897f24..02ec8a193b96 100644 > > --- a/arch/m68k/include/asm/bitops.h > > +++ b/arch/m68k/include/asm/bitops.h > > @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word) > > { > > int res; > > > > + if (__builtin_constant_p(word)) > > + return __builtin_ctzl(~word); > > + > > __asm__ __volatile__ ("bfffo %1{#0,#0},%0" > > : "=d" (res) : "d" (~word & -~word)); > > return res ^ 31; > > If the builtin has the desired behaviour, why do we reimplement it in asm? > Shouldn't we abandon one or the other to avoid having to prove (and > maintain) their equivalence? The asm is meant to produce better results when the argument is not a constant expression. Below commit is a good illustration of why we want both the asm and the built: https://git.kernel.org/torvalds/c/146034fed6ee I say "is meant", because I did not assert whether this is still true. Note that there are some cases in which the asm is not better anymore, for example, see this thread: https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/ but I did not receive more answers, so I stopped trying to investigate the subject. If you want, you can check the produced assembly of both the asm and the builtin for both clang and gcc, and if the builtin is always either better or equivalent, then the asm can be removed. That said, I am not spending more effort there after being ghosted once (c.f. above thread). > > @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word) > > !defined(CONFIG_M68000) > > static __always_inline unsigned long __ffs(unsigned long x) > > { > > + if (__builtin_constant_p(x)) > > + return __builtin_ctzl(x); > > + > > __asm__ __volatile__ ("bitrev %0; ff1 %0" > > : "=d" (x) > > : "0" (x)); > > @@ -522,6 +528,9 @@ static __always_inline int ffs(int x) > > { > > int cnt; > > > > + if (__builtin_constant_p(x)) > > + return __builtin_ffs(x); > > + > > __asm__ ("bfffo %1{#0:#0},%0" > > : "=d" (cnt) > > : "dm" (x & -x)); > > @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x) > > { > > int cnt; > > > > + if (__builtin_constant_p(x)) > > + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0; > > + > > __asm__ ("bfffo %1{#0,#0},%0" > > : "=d" (cnt) > > : "dm" (x)); > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 6:26 ` Vincent MAILHOL @ 2024-01-28 12:16 ` David Laight 2024-01-28 13:27 ` Vincent MAILHOL 0 siblings, 1 reply; 32+ messages in thread From: David Laight @ 2024-01-28 12:16 UTC (permalink / raw) To: 'Vincent MAILHOL', Finn Thain Cc: Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org From: Vincent MAILHOL > Sent: 28 January 2024 06:27 > > On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@linux-m68k.org> wrote: > > On Sun, 28 Jan 2024, Vincent Mailhol wrote: > > > > > The compiler is not able to do constant folding on "asm volatile" code. > > > > > > Evaluate whether or not the function argument is a constant expression > > > and if this is the case, return an equivalent builtin expression. > > > ... > > If the builtin has the desired behaviour, why do we reimplement it in asm? > > Shouldn't we abandon one or the other to avoid having to prove (and > > maintain) their equivalence? > > The asm is meant to produce better results when the argument is not a > constant expression. Below commit is a good illustration of why we > want both the asm and the built: > > https://git.kernel.org/torvalds/c/146034fed6ee > > I say "is meant", because I did not assert whether this is still true. > Note that there are some cases in which the asm is not better anymore, > for example, see this thread: > > https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/ > > but I did not receive more answers, so I stopped trying to investigate > the subject. > > If you want, you can check the produced assembly of both the asm and > the builtin for both clang and gcc, and if the builtin is always > either better or equivalent, then the asm can be removed. That said, I > am not spending more effort there after being ghosted once (c.f. above > thread). I don't see any example there of why the __builtin_xxx() versions shouldn't be used all the time. (The x86-64 asm blocks contain unrelated call instructions and objdump wasn't passed -d to show what they were. One even has the 'return thunk pessimisation showing.) I actually suspect the asm versions predate the builtins. Does (or can) the outer common header use the __builtin functions if no asm version exists? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 12:16 ` David Laight @ 2024-01-28 13:27 ` Vincent MAILHOL 2024-01-28 19:01 ` David Laight 2024-01-28 22:34 ` Finn Thain 0 siblings, 2 replies; 32+ messages in thread From: Vincent MAILHOL @ 2024-01-28 13:27 UTC (permalink / raw) To: David Laight Cc: Finn Thain, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Sun. 28 janv. 2024 at 21:16, David Laight <David.Laight@aculab.com> wrote: > From: Vincent MAILHOL > > Sent: 28 January 2024 06:27 > > > > On Sun. 28 Jan. 2024 at 14:39, Finn Thain <fthain@linux-m68k.org> wrote: > > > On Sun, 28 Jan 2024, Vincent Mailhol wrote: > > > > > > > The compiler is not able to do constant folding on "asm volatile" code. > > > > > > > > Evaluate whether or not the function argument is a constant expression > > > > and if this is the case, return an equivalent builtin expression. > > > > > ... > > > If the builtin has the desired behaviour, why do we reimplement it in asm? > > > Shouldn't we abandon one or the other to avoid having to prove (and > > > maintain) their equivalence? > > > > The asm is meant to produce better results when the argument is not a > > constant expression. Below commit is a good illustration of why we > > want both the asm and the built: > > > > https://git.kernel.org/torvalds/c/146034fed6ee > > > > I say "is meant", because I did not assert whether this is still true. > > Note that there are some cases in which the asm is not better anymore, > > for example, see this thread: > > > > https://lore.kernel.org/lkml/20221106095106.849154-2-mailhol.vincent@wanadoo.fr/ > > > > but I did not receive more answers, so I stopped trying to investigate > > the subject. > > > > If you want, you can check the produced assembly of both the asm and > > the builtin for both clang and gcc, and if the builtin is always > > either better or equivalent, then the asm can be removed. That said, I > > am not spending more effort there after being ghosted once (c.f. above > > thread). > > I don't see any example there of why the __builtin_xxx() versions > shouldn't be used all the time. > (The x86-64 asm blocks contain unrelated call instructions and objdump > wasn't passed -d to show what they were. > One even has the 'return thunk pessimisation showing.) Fair. My goal was not to point to the assembly code but to this sentence: However, for non constant expressions, the kernel's ffs() asm version remains better for x86_64 because, contrary to GCC, it doesn't emit the CMOV assembly instruction I should have been more clear. Sorry for that. But the fact remains, on x86, some of the asm produced more optimized code than the builtin. > I actually suspect the asm versions predate the builtins. This seems true. The __bultins were introduced in: generic: Implement generic ffs/fls using __builtin_* functions https://git.kernel.org/torvalds/c/048fa2df92c3 when the asm implementation already existed in m68k. > Does (or can) the outer common header use the __builtin functions > if no asm version exists? Yes, this would be extremely easy. You just need to #include/asm-generic/bitops/builtin-__ffs.h #include/asm-generic/bitops/builtin-ffs.h #include/asm-generic/bitops/builtin-__fls.h #include/asm-generic/bitops/builtin-fls.h and remove all the asm implementations. If you give me your green light, I can do that change. This is trivial. The only thing I am not ready to do is to compare the produced assembly code and confirm whether or not it is better to remove asm code. Thoughts? Yours sincerely, Vincent Mailhol ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 13:27 ` Vincent MAILHOL @ 2024-01-28 19:01 ` David Laight 2024-01-28 22:34 ` Finn Thain 1 sibling, 0 replies; 32+ messages in thread From: David Laight @ 2024-01-28 19:01 UTC (permalink / raw) To: 'Vincent MAILHOL' Cc: Finn Thain, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org From: Vincent MAILHOL > Sent: 28 January 2024 13:28 ... > Fair. My goal was not to point to the assembly code but to this sentence: > > However, for non constant expressions, the kernel's ffs() asm > version remains better for x86_64 because, contrary to GCC, it > doesn't emit the CMOV assembly instruction The comment in the code is: * AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the * dest reg is undefined if x==0, but their CPU architect says its * value is written to set it to the same as before, except that the * top 32 bits will be cleared. I guess gcc isn't willing to act on hearsay! > > I should have been more clear. Sorry for that. > > But the fact remains, on x86, some of the asm produced more optimized > code than the builtin. > > > I actually suspect the asm versions predate the builtins. > > This seems true. The __bultins were introduced in: > > generic: Implement generic ffs/fls using __builtin_* functions > https://git.kernel.org/torvalds/c/048fa2df92c3 I was thinking of compiler versions not kernel source commits. You'd need to be doing some serious software archaeology. > when the asm implementation already existed in m68k. > > > Does (or can) the outer common header use the __builtin functions > > if no asm version exists? > > Yes, this would be extremely easy. You just need to > > #include/asm-generic/bitops/builtin-__ffs.h > #include/asm-generic/bitops/builtin-ffs.h > #include/asm-generic/bitops/builtin-__fls.h > #include/asm-generic/bitops/builtin-fls.h > > and remove all the asm implementations. If you give me your green > light, I can do that change. This is trivial. The only thing I am not > ready to do is to compare the produced assembly code and confirm > whether or not it is better to remove asm code. > > Thoughts? Not for me to say. But the builtins are likely to generate reasonable code. So unless the asm can be better (like trusting undocumented x86 cpu behaviour) using them is probably best. For the ones you are changing it may be best to propose using the builtins all the time. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 13:27 ` Vincent MAILHOL 2024-01-28 19:01 ` David Laight @ 2024-01-28 22:34 ` Finn Thain 2024-02-04 13:56 ` Vincent MAILHOL 1 sibling, 1 reply; 32+ messages in thread From: Finn Thain @ 2024-01-28 22:34 UTC (permalink / raw) To: Vincent MAILHOL Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Sun, 28 Jan 2024, Vincent MAILHOL wrote: > > > The asm is meant to produce better results when the argument is not > > > a constant expression. Is that because gcc's implementation has to satisfy requirements that are excessively stringent for the kernel's purposes? Or is it a compiler deficiency only affecting certain architectures? > ... The only thing I am not ready to do is to compare the produced > assembly code and confirm whether or not it is better to remove asm > code. > If you do the comparison and find no change, you get to say so in the commit log, and everyone is happy. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 22:34 ` Finn Thain @ 2024-02-04 13:56 ` Vincent MAILHOL 2024-02-04 23:13 ` Finn Thain 0 siblings, 1 reply; 32+ messages in thread From: Vincent MAILHOL @ 2024-02-04 13:56 UTC (permalink / raw) To: Finn Thain Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org Hi Finn and David, Sorry for the late feedback, I did not have much time during weekdays. On Monday. 29 Jan. 2024 at 07:34, Finn Thain <fthain@linux-m68k.org> wrote: > On Sun, 28 Jan 2024, Vincent MAILHOL wrote: > > > > The asm is meant to produce better results when the argument is not > > > > a constant expression. > > Is that because gcc's implementation has to satisfy requirements that are > excessively stringent for the kernel's purposes? Or is it a compiler > deficiency only affecting certain architectures? I just guess that GCC guys followed the Intel datasheet while the kernel guys like to live a bit more dangerously and rely on some not so well defined behaviours... But I am really not the person to whom you should ask. I just want to optimize the constant folding and this is the only purpose of this series. I am absolutely not an asm. That's also why I am reluctant to compare the asm outputs. > > ... The only thing I am not ready to do is to compare the produced > > assembly code and confirm whether or not it is better to remove asm > > code. > > > > If you do the comparison and find no change, you get to say so in the > commit log, and everyone is happy. Without getting into details, here is a quick comparaisons of gcc and clang generated asm for all the bitops builtin: https://godbolt.org/z/Yb8nMKnYf To the extent of my limited knowledge, it looks rather OK for gcc, but for clang... seems that this is the default software implementation. So are we fine with the current patch? Or maybe clang support is not important for m68k? I do not know so tell me :) Yours sincerely, Vincent Mailhol ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-02-04 13:56 ` Vincent MAILHOL @ 2024-02-04 23:13 ` Finn Thain 2024-02-05 9:17 ` Vincent MAILHOL 0 siblings, 1 reply; 32+ messages in thread From: Finn Thain @ 2024-02-04 23:13 UTC (permalink / raw) To: Vincent MAILHOL Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Sun, 4 Feb 2024, Vincent MAILHOL wrote: > Sorry for the late feedback, I did not have much time during weekdays. > > On Monday. 29 Jan. 2024 at 07:34, Finn Thain <fthain@linux-m68k.org> wrote: > > On Sun, 28 Jan 2024, Vincent MAILHOL wrote: > > > > > The asm is meant to produce better results when the argument is not > > > > > a constant expression. > > > > Is that because gcc's implementation has to satisfy requirements that are > > excessively stringent for the kernel's purposes? Or is it a compiler > > deficiency only affecting certain architectures? > > I just guess that GCC guys followed the Intel datasheet while the > kernel guys like to live a bit more dangerously and rely on some not > so well defined behaviours... But I am really not the person to whom > you should ask. > > I just want to optimize the constant folding and this is the only > purpose of this series. I am absolutely not an asm. That's also why I > am reluctant to compare the asm outputs. > How does replacing asm with a builtin prevent constant folding? > > > ... The only thing I am not ready to do is to compare the produced > > > assembly code and confirm whether or not it is better to remove asm > > > code. > > > > > > > If you do the comparison and find no change, you get to say so in the > > commit log, and everyone is happy. > > Without getting into details, here is a quick comparaisons of gcc and > clang generated asm for all the bitops builtin: > > https://godbolt.org/z/Yb8nMKnYf > > To the extent of my limited knowledge, it looks rather OK for gcc, but > for clang... seems that this is the default software implementation. > > So are we fine with the current patch? Or maybe clang support is not > important for m68k? I do not know so tell me :) > Let's see if I understand. You are proposing that the kernel source carry an unquantified optimization, with inherent complexity and maintenance costs, just for the benefit of users who choose a compiler that doesn't work as well as the standard compiler. Is that it? At some point in the future when clang comes up to scrach with gcc and the builtin reaches parity with the asm, I wonder if you will then remove both your optimization and the asm, to eliminate the afore-mentioned complexity and maintenance costs. Is there an incentive for that work? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-02-04 23:13 ` Finn Thain @ 2024-02-05 9:17 ` Vincent MAILHOL 2024-02-05 9:48 ` Finn Thain 0 siblings, 1 reply; 32+ messages in thread From: Vincent MAILHOL @ 2024-02-05 9:17 UTC (permalink / raw) To: Finn Thain Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Mon. 5 Feb. 2024 at 08:13, Finn Thain <fthain@linux-m68k.org> wrote: > On Sun, 4 Feb 2024, Vincent MAILHOL wrote: (...) > Let's see if I understand. > > You are proposing that the kernel source carry an unquantified > optimization, with inherent complexity and maintenance costs, just for the > benefit of users who choose a compiler that doesn't work as well as the > standard compiler. Is that it? My proposal is quantified, c.f. my commit message: On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB. "Saving roughly 8kb" is a quantification. And clang use in the kernel is standardized: https://www.kernel.org/doc/html/latest/process/changes.html#current-minimal-requirements GCC may be predominant, but, although optional, clang v11+ is officially supported, and thus my patches should not neglect it. This is why I am asking whether or not clang support is important or not for m68k. If you tell me it is not, then fine, I will remove all the asm (by the way, the patch is already ready). But if there are even a few users who care about clang for m68k, then I do not think we should penalize them and I would not sign-off a change which negatively impacts some users. The linux-m68k mailing list should know better than me if people care about clang support. So let me reiterate the question from my previous message: is clang support important for you? I would like a formal acknowledgement from the linux-m68k mailing list before sending a patch that may negatively impact some users. Thank you. > At some point in the future when clang comes up to scrach with gcc and the > builtin reaches parity with the asm, I wonder if you will then remove both > your optimization and the asm, to eliminate the afore-mentioned complexity > and maintenance costs. Is there an incentive for that work? I will not follow up the evolution of clang support for m68k builtins. The goal of this series is to add a test to assert that all architectures correctly do the constant folding on the bit find operations (fifth patch of this series). It happens that m68k and hexagon architectures are failing that test, so I am fixing this as a one shot. After this series, I have no plan to do further work on m68k and hexagon architectures. Thank you for your understanding. Yours sincerely, Vincent Mailhol ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-02-05 9:17 ` Vincent MAILHOL @ 2024-02-05 9:48 ` Finn Thain 2024-02-05 10:43 ` Vincent MAILHOL 0 siblings, 1 reply; 32+ messages in thread From: Finn Thain @ 2024-02-05 9:48 UTC (permalink / raw) To: Vincent MAILHOL Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Mon, 5 Feb 2024, Vincent MAILHOL wrote: > > This is why I am asking whether or not clang support is important or not > for m68k. If you tell me it is not, then fine, I will remove all the asm > (by the way, the patch is already ready). But if there are even a few > users who care about clang for m68k, then I do not think we should > penalize them and I would not sign-off a change which negatively impacts > some users. > If clang support is important then clang's builtins are important. So why not improve those instead? That would resolve the issue in a win-win. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-02-05 9:48 ` Finn Thain @ 2024-02-05 10:43 ` Vincent MAILHOL 2024-02-05 15:40 ` [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions Vincent Mailhol 2024-02-07 6:31 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Finn Thain 0 siblings, 2 replies; 32+ messages in thread From: Vincent MAILHOL @ 2024-02-05 10:43 UTC (permalink / raw) To: Finn Thain Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Mon. 5 Feb. 2024 at 18:48, Finn Thain <fthain@linux-m68k.org> wrote: > On Mon, 5 Feb 2024, Vincent MAILHOL wrote: > > > > > This is why I am asking whether or not clang support is important or not > > for m68k. If you tell me it is not, then fine, I will remove all the asm > > (by the way, the patch is already ready). But if there are even a few > > users who care about clang for m68k, then I do not think we should > > penalize them and I would not sign-off a change which negatively impacts > > some users. > > > > If clang support is important then clang's builtins are important. So why > not improve those instead? That would resolve the issue in a win-win. I am deeply sorry, but with all your respect, this request is unfair. I will not fix the compiler. Let me repeat my question for the third time: are you (or any other m68k maintainer) ready to acknowledge that we can degrade the performance for clang m68k users? With that acknowledgement, I will remove the asm and replace it with the builtins. Yours sincerely, Vincent Mailhol ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions 2024-02-05 10:43 ` Vincent MAILHOL @ 2024-02-05 15:40 ` Vincent Mailhol 2024-02-07 6:31 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Finn Thain 1 sibling, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2024-02-05 15:40 UTC (permalink / raw) To: fthain Cc: mailhol.vincent, David.Laight, akpm, bcain, dianders, elver, geert+renesas, geert, keescook, linux-kernel, linux-m68k, ndesaulniers, paulmck, pmladek, rdunlap, willy, yury.norov, zhaoyang.huang GCC and clang provides a set of builtin functions which can be used as a replacement for ffs(), __ffs(), fls() and __fls(). Using the builtin comes as a trade-off. Pros: - Simpler code, easier to maintain - Guarantee the constant folding Cons: - clang does not provide optimized code. Ref: https://godbolt.org/z/Yb8nMKnYf Despite of the cons, use the builtin unconditionally and remove any existing assembly implementation. This done, remove the find_first_zero_bit(), find_next_zero_bit(), find_first_bit() and find_next_bit() assembly implementations. Instead, rely on the generic functions from linux/find.h which will themselves rely on the builtin we just set up. Not-signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- As written earlier, I do not want to sign-off a patch which can degrade the performances of m68k clang users. But because it is already written, there it is. If someone wants to pick up this, go ahead. Just make sure to remove any reference to myself. I hereby grant permission for anyone to reuse that patch, with or without modifications, under the unique condition that my name gets removed. --- arch/m68k/include/asm/bitops.h | 215 +-------------------------------- 1 file changed, 5 insertions(+), 210 deletions(-) diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index 14c64a6f1217..44aac8ced9fc 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -340,218 +340,13 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask, #endif } -/* - * The true 68020 and more advanced processors support the "bfffo" - * instruction for finding bits. ColdFire and simple 68000 parts - * (including CPU32) do not support this. They simply use the generic - * functions. - */ -#if defined(CONFIG_CPU_HAS_NO_BITFIELDS) -#include <asm-generic/bitops/ffz.h> -#else - -static inline int find_first_zero_bit(const unsigned long *vaddr, - unsigned size) -{ - const unsigned long *p = vaddr; - int res = 32; - unsigned int words; - unsigned long num; - - if (!size) - return 0; - - words = (size + 31) >> 5; - while (!(num = ~*p++)) { - if (!--words) - goto out; - } - - __asm__ __volatile__ ("bfffo %1{#0,#0},%0" - : "=d" (res) : "d" (num & -num)); - res ^= 31; -out: - res += ((long)p - (long)vaddr - 4) * 8; - return res < size ? res : size; -} -#define find_first_zero_bit find_first_zero_bit - -static inline int find_next_zero_bit(const unsigned long *vaddr, int size, - int offset) -{ - const unsigned long *p = vaddr + (offset >> 5); - int bit = offset & 31UL, res; - - if (offset >= size) - return size; - - if (bit) { - unsigned long num = ~*p++ & (~0UL << bit); - offset -= bit; - - /* Look for zero in first longword */ - __asm__ __volatile__ ("bfffo %1{#0,#0},%0" - : "=d" (res) : "d" (num & -num)); - if (res < 32) { - offset += res ^ 31; - return offset < size ? offset : size; - } - offset += 32; - - if (offset >= size) - return size; - } - /* No zero yet, search remaining full bytes for a zero */ - return offset + find_first_zero_bit(p, size - offset); -} -#define find_next_zero_bit find_next_zero_bit - -static inline int find_first_bit(const unsigned long *vaddr, unsigned size) -{ - const unsigned long *p = vaddr; - int res = 32; - unsigned int words; - unsigned long num; - - if (!size) - return 0; - - words = (size + 31) >> 5; - while (!(num = *p++)) { - if (!--words) - goto out; - } - - __asm__ __volatile__ ("bfffo %1{#0,#0},%0" - : "=d" (res) : "d" (num & -num)); - res ^= 31; -out: - res += ((long)p - (long)vaddr - 4) * 8; - return res < size ? res : size; -} -#define find_first_bit find_first_bit - -static inline int find_next_bit(const unsigned long *vaddr, int size, - int offset) -{ - const unsigned long *p = vaddr + (offset >> 5); - int bit = offset & 31UL, res; - - if (offset >= size) - return size; - - if (bit) { - unsigned long num = *p++ & (~0UL << bit); - offset -= bit; - - /* Look for one in first longword */ - __asm__ __volatile__ ("bfffo %1{#0,#0},%0" - : "=d" (res) : "d" (num & -num)); - if (res < 32) { - offset += res ^ 31; - return offset < size ? offset : size; - } - offset += 32; - - if (offset >= size) - return size; - } - /* No one yet, search remaining full bytes for a one */ - return offset + find_first_bit(p, size - offset); -} -#define find_next_bit find_next_bit - -/* - * ffz = Find First Zero in word. Undefined if no zero exists, - * so code should check against ~0UL first.. - */ -static inline unsigned long ffz(unsigned long word) -{ - int res; - - __asm__ __volatile__ ("bfffo %1{#0,#0},%0" - : "=d" (res) : "d" (~word & -~word)); - return res ^ 31; -} - -#endif - #ifdef __KERNEL__ -#if defined(CONFIG_CPU_HAS_NO_BITFIELDS) - -/* - * The newer ColdFire family members support a "bitrev" instruction - * and we can use that to implement a fast ffs. Older Coldfire parts, - * and normal 68000 parts don't have anything special, so we use the - * generic functions for those. - */ -#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \ - !defined(CONFIG_M68000) -static inline unsigned long __ffs(unsigned long x) -{ - __asm__ __volatile__ ("bitrev %0; ff1 %0" - : "=d" (x) - : "0" (x)); - return x; -} - -static inline int ffs(int x) -{ - if (!x) - return 0; - return __ffs(x) + 1; -} - -#else -#include <asm-generic/bitops/ffs.h> -#include <asm-generic/bitops/__ffs.h> -#endif - -#include <asm-generic/bitops/fls.h> -#include <asm-generic/bitops/__fls.h> - -#else - -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static inline int ffs(int x) -{ - int cnt; - - __asm__ ("bfffo %1{#0:#0},%0" - : "=d" (cnt) - : "dm" (x & -x)); - return 32 - cnt; -} - -static inline unsigned long __ffs(unsigned long x) -{ - return ffs(x) - 1; -} - -/* - * fls: find last bit set. - */ -static inline int fls(unsigned int x) -{ - int cnt; - - __asm__ ("bfffo %1{#0,#0},%0" - : "=d" (cnt) - : "dm" (x)); - return 32 - cnt; -} - -static inline unsigned long __fls(unsigned long x) -{ - return fls(x) - 1; -} - -#endif +#include <asm-generic/bitops/builtin-ffs.h> +#include <asm-generic/bitops/builtin-__ffs.h> +#include <asm-generic/bitops/builtin-fls.h> +#include <asm-generic/bitops/builtin-__fls.h> +#include <asm-generic/bitops/ffz.h> /* Simple test-and-set bit locks */ #define test_and_set_bit_lock test_and_set_bit -- 2.43.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-02-05 10:43 ` Vincent MAILHOL 2024-02-05 15:40 ` [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions Vincent Mailhol @ 2024-02-07 6:31 ` Finn Thain 1 sibling, 0 replies; 32+ messages in thread From: Finn Thain @ 2024-02-07 6:31 UTC (permalink / raw) To: Vincent MAILHOL Cc: David Laight, Andrew Morton, linux-kernel@vger.kernel.org, Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k@lists.linux-m68k.org On Mon, 5 Feb 2024, Vincent MAILHOL wrote: > On Mon. 5 Feb. 2024 at 18:48, Finn Thain <fthain@linux-m68k.org> wrote: > > On Mon, 5 Feb 2024, Vincent MAILHOL wrote: > > > > If clang support is important then clang's builtins are important. So > > why not improve those instead? That would resolve the issue in a > > win-win. > > I am deeply sorry, but with all your respect, this request is unfair. I > will not fix the compiler. > Absolutely. It is unfair. Just as your proposal was unfair to maintainers. That patch would make a compiler deficiency into a maintenance burden. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol @ 2024-01-28 5:00 ` Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2024-01-28 5:00 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k, Vincent Mailhol The inline keyword actually does not guarantee that the compiler will inline a functions. Whenever the goal is to actually inline a function, __always_inline should always be preferred instead. __always_inline is also needed for further optimizations which will come up in a follow-up patch. Inline all the bit-find function which have a custom hexagon assembly implementation, namely: __ffs(), ffs(), ffz(), __fls(), fls(). On linux v6.7 defconfig with clang 17.0.6, it does not impact the final size, meaning that, overall, those function were already inlined on modern clangs: $ size --format=GNU vmlinux.before vmlinux.after vmlinux.final text data bss total filename 4827900 1798340 364057 6990297 vmlinux.before 4827900 1798340 364057 6990297 vmlinux.after Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends") Link: https://git.kernel.org/torvalds/c/8dd5032d9c54 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/hexagon/include/asm/bitops.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h index 160d8f37fa1a..e856d6dbfe16 100644 --- a/arch/hexagon/include/asm/bitops.h +++ b/arch/hexagon/include/asm/bitops.h @@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) * * Undefined if no zero exists, so code should check against ~0UL first. */ -static inline long ffz(int x) +static __always_inline long ffz(int x) { int r; @@ -217,7 +217,7 @@ static inline long ffz(int x) * This is defined the same way as ffs. * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. */ -static inline int fls(unsigned int x) +static __always_inline int fls(unsigned int x) { int r; @@ -238,7 +238,7 @@ static inline int fls(unsigned int x) * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ -static inline int ffs(int x) +static __always_inline int ffs(int x) { int r; @@ -260,7 +260,7 @@ static inline int ffs(int x) * bits_per_long assumed to be 32 * numbering starts at 0 I think (instead of 1 like ffs) */ -static inline unsigned long __ffs(unsigned long word) +static __always_inline unsigned long __ffs(unsigned long word) { int num; @@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word) * Undefined if no set bit exists, so code should check against 0 first. * bits_per_long assumed to be 32 */ -static inline unsigned long __fls(unsigned long word) +static __always_inline unsigned long __fls(unsigned long word) { int num; -- 2.43.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol ` (2 preceding siblings ...) 2024-01-28 5:00 ` [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions Vincent Mailhol @ 2024-01-28 5:00 ` Vincent Mailhol 2024-01-28 5:00 ` [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2024-01-28 5:00 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k, Vincent Mailhol The compiler is not able to do constant folding on "asm volatile" code. Evaluate whether or not the function argument is a constant expression and if this is the case, return an equivalent builtin expression. On linux 6.7 with an allyesconfig and clang 17.0.6, it saves roughly 4 KB. $ size --format=GNU vmlinux.before vmlinux.after text data bss total filename 4827900 1798340 364057 6990297 vmlinux.before 4827072 1795060 364057 6986189 vmlinux.after Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions") Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1 Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h index e856d6dbfe16..c74a639c84f3 100644 --- a/arch/hexagon/include/asm/bitops.h +++ b/arch/hexagon/include/asm/bitops.h @@ -204,6 +204,9 @@ static __always_inline long ffz(int x) { int r; + if (__builtin_constant_p(x)) + return __builtin_ctzl(~x); + asm("%0 = ct1(%1);\n" : "=&r" (r) : "r" (x)); @@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x) { int r; + if (__builtin_constant_p(x)) + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0; + asm("{ %0 = cl0(%1);}\n" "%0 = sub(#32,%0);\n" : "=&r" (r) @@ -242,6 +248,9 @@ static __always_inline int ffs(int x) { int r; + if (__builtin_constant_p(x)) + return __builtin_ffs(x); + asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n" "{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n" : "=&r" (r) @@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word) { int num; + if (__builtin_constant_p(word)) + return __builtin_ctzl(word); + asm("%0 = ct0(%1);\n" : "=&r" (num) : "r" (word)); @@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word) { int num; + if (__builtin_constant_p(word)) + return BITS_PER_LONG - 1 - __builtin_clzl(word); + asm("%0 = cl0(%1);\n" "%0 = sub(#31,%0);\n" : "=&r" (num) -- 2.43.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions 2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol ` (3 preceding siblings ...) 2024-01-28 5:00 ` [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol @ 2024-01-28 5:00 ` Vincent Mailhol 4 siblings, 0 replies; 32+ messages in thread From: Vincent Mailhol @ 2024-01-28 5:00 UTC (permalink / raw) To: Andrew Morton, linux-kernel Cc: Yury Norov, Nick Desaulniers, Douglas Anderson, Kees Cook, Petr Mladek, Randy Dunlap, Zhaoyang Huang, Geert Uytterhoeven, Marco Elver, Brian Cain, Geert Uytterhoeven, Matthew Wilcox, Paul E . McKenney, linux-m68k, Vincent Mailhol Add a function in the bitops test suite to assert that the bitops helper functions correctly fold into constant expressions (or trigger a build bug otherwise). This should work on all the optimization levels supported by Kbuild. The added function does not perform any runtime tests and gets optimized out to nothing after passing the build assertions. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> --- lib/Kconfig.debug | 4 ++++ lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4405f81248fb..85f8638b3ae6 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2439,6 +2439,10 @@ config TEST_BITOPS compilations. It has no dependencies and doesn't run or load unless explicitly requested by name. for example: modprobe test_bitops. + In addition, check that the compiler is able to fold the bitops + function into a compile-time constant (given that the argument is also + a compile-time constant) and trigger a build bug otherwise. + If unsure, say N. config TEST_VMALLOC diff --git a/lib/test_bitops.c b/lib/test_bitops.c index 3b7bcbee84db..99b612515eb6 100644 --- a/lib/test_bitops.c +++ b/lib/test_bitops.c @@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = { }; #endif +/* Assert that a boolean expression can be folded in a constant and is true. */ +#define test_const_eval(test_expr) \ +({ \ + /* Evaluate once so that compiler can fold it. */ \ + bool __test_expr = test_expr; \ + \ + BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \ + BUILD_BUG_ON(!__test_expr); \ +}) + +/* + * On any supported optimization level (-O2, -Os) and if invoked with + * a compile-time constant argument, the compiler must be able to fold + * into a constant expression all the bit find functions. Namely: + * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise, + * trigger a build bug. + */ +static __always_inline void test_bitops_const_eval(unsigned int n) +{ + test_const_eval(__ffs(BIT(n)) == n); + test_const_eval(ffs(BIT(n)) == n + 1); + test_const_eval(ffz(~BIT(n)) == n); + test_const_eval(__fls(BIT(n)) == n); + test_const_eval(fls(BIT(n)) == n + 1); + test_const_eval(fls64(BIT_ULL(n)) == n + 1); + test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33); +} + static int __init test_bitops_startup(void) { int i, bit_set; @@ -94,6 +122,10 @@ static int __init test_bitops_startup(void) if (bit_set != BITOPS_LAST) pr_err("ERROR: FOUND SET BIT %d\n", bit_set); + test_bitops_const_eval(0); + test_bitops_const_eval(10); + test_bitops_const_eval(31); + pr_info("Completed bitops test\n"); return 0; -- 2.43.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
end of thread, other threads:[~2024-02-07 6:30 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-11 8:13 [RFC PATCH] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
2023-11-30 10:27 ` [PATCH v2] " Vincent Mailhol
2023-11-30 16:25 ` kernel test robot
2023-12-06 13:27 ` kernel test robot
2023-12-17 7:12 ` [PATCH v3 0/5] bitops: optimize code and add tests Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 1/5] m68k/bitops: force inlining of all bitops functions Vincent Mailhol
2024-01-02 10:28 ` Geert Uytterhoeven
2024-01-07 12:01 ` Vincent MAILHOL
2023-12-17 7:12 ` [PATCH v3 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-02 10:49 ` Geert Uytterhoeven
2023-12-17 7:12 ` [PATCH v3 3/5] hexagon/bitops: force inlining of all bitops functions Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2023-12-17 7:12 ` [PATCH v3 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
2024-01-28 5:00 ` [PATCH v4 0/5] bitops: optimize code and add tests Vincent Mailhol
2024-01-28 5:00 ` [PATCH v4 1/5] m68k/bitops: force inlining of all bit-find functions Vincent Mailhol
2024-01-28 5:00 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-28 5:39 ` Finn Thain
2024-01-28 6:26 ` Vincent MAILHOL
2024-01-28 12:16 ` David Laight
2024-01-28 13:27 ` Vincent MAILHOL
2024-01-28 19:01 ` David Laight
2024-01-28 22:34 ` Finn Thain
2024-02-04 13:56 ` Vincent MAILHOL
2024-02-04 23:13 ` Finn Thain
2024-02-05 9:17 ` Vincent MAILHOL
2024-02-05 9:48 ` Finn Thain
2024-02-05 10:43 ` Vincent MAILHOL
2024-02-05 15:40 ` [PATCH] m68k/bitops: always use compiler's builtin for bit finding functions Vincent Mailhol
2024-02-07 6:31 ` [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Finn Thain
2024-01-28 5:00 ` [PATCH v4 3/5] hexagon/bitops: force inlining of all bit-find functions Vincent Mailhol
2024-01-28 5:00 ` [PATCH v4 4/5] hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions Vincent Mailhol
2024-01-28 5:00 ` [PATCH v4 5/5] lib: test_bitops: add compile-time optimization/evaluations assertions Vincent Mailhol
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.