* [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly
@ 2025-09-16 13:47 Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 1/4] s390/bitops: Limit return value range of __flogr() Heiko Carstens
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Heiko Carstens @ 2025-09-16 13:47 UTC (permalink / raw)
To: Nathan Chancellor, Miguel Ojeda, Kees Cook, Nick Desaulniers,
Vasily Gorbik, Alexander Gordeev, Juergen Christ
Cc: linux-kernel, linux-s390, Sven Schnelle, Christian Borntraeger
v2:
- Split patches differently, so that the first patch is the actual
fix, which addresses the reported warning / error. The subsequent
patches are optimizations
v1:
https://lore.kernel.org/all/20250910151216.646600-1-hca@linux.ibm.com/
A recent optimization of the s390 specific ffs() and ffs64()
implementations leads to a new compiler warning. Instead of reverting the
optimization address this with the rather new assume attribute, which
generates even better code, if supported by compilers.
Since the assume attribute may be useful for others as well, add the
__assume macro to compiler attributes, so it is kernel wide available,
instead of adding an s390 specific optimization.
Heiko Carstens (4):
s390/bitops: Limit return value range of __flogr()
compiler_types: Add __assume macro
s390/bitops: Use __assume() for __flogr() inline assembly return value
s390/bitops: Cleanup __flogr()
arch/s390/include/asm/bitops.h | 21 ++++++++++++++-------
include/linux/compiler_types.h | 23 +++++++++++++++++++++++
init/Kconfig | 10 ++++++++++
3 files changed, 47 insertions(+), 7 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/4] s390/bitops: Limit return value range of __flogr()
2025-09-16 13:47 [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Heiko Carstens
@ 2025-09-16 13:48 ` Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 2/4] compiler_types: Add __assume macro Heiko Carstens
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Heiko Carstens @ 2025-09-16 13:48 UTC (permalink / raw)
To: Nathan Chancellor, Miguel Ojeda, Kees Cook, Nick Desaulniers,
Vasily Gorbik, Alexander Gordeev, Juergen Christ
Cc: linux-kernel, linux-s390, Sven Schnelle, Christian Borntraeger
With the recent ffs() and ffs64() optimization a logical AND operation was
removed, which allowed the compiler to tell the return value range of both
functions. This may lead to compile warnings as reported by the kernel test
robot:
drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5r_cache_create_ent_locked':
>> drivers/infiniband/hw/mlx5/mr.c:840:31: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
840 | sprintf(ent->name, "%d", order);
| ^
In function 'mlx5_mkey_cache_debugfs_add_ent',
inlined from 'mlx5r_cache_create_ent_locked' at drivers/infiniband/hw/mlx5/mr.c:930:3:
drivers/infiniband/hw/mlx5/mr.c:840:9: note: 'sprintf' output between 2 and 5 bytes into a destination of size 4
840 | sprintf(ent->name, "%d", order);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Add the AND operation again to address the warning.
From a correctness point of view the AND operation is not necessary,
however there is no other way to tell the compiler that the returned
value of the flogr inline assembly is in the range of 0..64.
This increases the kernel image size by 566 bytes (defconfig, gcc 15.2.0).
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508211859.UoYsJbLN-lkp@intel.com/
Fixes: de88e74889a3 ("s390/bitops: Slightly optimize ffs() and fls64()")
Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
arch/s390/include/asm/bitops.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h
index 56376b6ff809..a1dd72b16f54 100644
--- a/arch/s390/include/asm/bitops.h
+++ b/arch/s390/include/asm/bitops.h
@@ -169,7 +169,7 @@ static __always_inline unsigned char __flogr(unsigned long word)
asm volatile(
" flogr %[rp],%[rp]\n"
: [rp] "+d" (rp.pair) : : "cc");
- return rp.even;
+ return rp.even & 127;
}
}
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/4] compiler_types: Add __assume macro
2025-09-16 13:47 [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 1/4] s390/bitops: Limit return value range of __flogr() Heiko Carstens
@ 2025-09-16 13:48 ` Heiko Carstens
2025-09-17 2:30 ` Nathan Chancellor
2025-09-16 13:48 ` [PATCH v2 3/4] s390/bitops: Use __assume() for __flogr() inline assembly return value Heiko Carstens
` (2 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: Heiko Carstens @ 2025-09-16 13:48 UTC (permalink / raw)
To: Nathan Chancellor, Miguel Ojeda, Kees Cook, Nick Desaulniers,
Vasily Gorbik, Alexander Gordeev, Juergen Christ
Cc: linux-kernel, linux-s390, Sven Schnelle, Christian Borntraeger
Make the statement attribute "assume" with a new __assume macro available.
The assume attribute is used to indicate that a certain condition is
assumed to be true. Compilers may or may not use this indication to
generate optimized code. If this condition is violated at runtime, the
behavior is undefined.
Note that the clang documentation states that optimizers may react
differently to this attribute, and this may even have a negative
performance impact. Therefore this attribute should be used with care.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
include/linux/compiler_types.h | 23 +++++++++++++++++++++++
init/Kconfig | 10 ++++++++++
2 files changed, 33 insertions(+)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index a910f9fa5341..41c16fb8eb40 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -329,6 +329,29 @@ struct ftrace_likely_data {
#define __no_sanitize_or_inline __always_inline
#endif
+/*
+ * The assume attribute is used to indicate that a certain condition is
+ * assumed to be true. If this condition is violated at runtime, the behavior
+ * is undefined. Compilers may or may not use this indication to generate
+ * optimized code.
+ *
+ * Note that the clang documentation states that optimizers may react
+ * differently to this attribute, and this may even have a negative
+ * performance impact. Therefore this attribute should be used with care.
+ *
+ * Optional: only supported since gcc >= 13
+ * Optional: only supported since clang >= 19
+ *
+ * gcc: https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html#index-assume-statement-attribute
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#id13
+ *
+ */
+#ifdef CONFIG_CC_HAS_ASSUME
+# define __assume(expr) __attribute__((__assume__(expr)))
+#else
+# define __assume(expr)
+#endif
+
/*
* Optional: only supported since gcc >= 15
* Optional: only supported since clang >= 18
diff --git a/init/Kconfig b/init/Kconfig
index 59ae2b967195..935eff59af97 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -112,6 +112,16 @@ config TOOLS_SUPPORT_RELR
config CC_HAS_ASM_INLINE
def_bool $(success,echo 'void foo(void) { asm inline (""); }' | $(CC) -x c - -c -o /dev/null)
+config CC_HAS_ASSUME
+ bool
+ # clang needs to be at least 19.1.0 since the meaning of the assume
+ # attribute changed:
+ # https://github.com/llvm/llvm-project/commit/c44fa3e8a9a44c2e9a575768a3c185354b9f6c17
+ default y if CC_IS_CLANG && CLANG_VERSION >= 190100
+ # supported since gcc 13.1.0
+ # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106654
+ default y if CC_IS_GCC && GCC_VERSION >= 130100
+
config CC_HAS_NO_PROFILE_FN_ATTR
def_bool $(success,echo '__attribute__((no_profile_instrument_function)) int x();' | $(CC) -x c - -c -o /dev/null -Werror)
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/4] s390/bitops: Use __assume() for __flogr() inline assembly return value
2025-09-16 13:47 [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 1/4] s390/bitops: Limit return value range of __flogr() Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 2/4] compiler_types: Add __assume macro Heiko Carstens
@ 2025-09-16 13:48 ` Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 4/4] s390/bitops: Cleanup __flogr() Heiko Carstens
2025-09-18 14:19 ` [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Alexander Gordeev
4 siblings, 0 replies; 7+ messages in thread
From: Heiko Carstens @ 2025-09-16 13:48 UTC (permalink / raw)
To: Nathan Chancellor, Miguel Ojeda, Kees Cook, Nick Desaulniers,
Vasily Gorbik, Alexander Gordeev, Juergen Christ
Cc: linux-kernel, linux-s390, Sven Schnelle, Christian Borntraeger
Use __assume() to tell compilers that the output operand of the __flogr()
inline assembly contains a value in the range of 0..64. This allows to
optimize the logical AND operation away.
This reduces the kernel image size by 2804 bytes (defconfig, gcc 15.2.0).
Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
arch/s390/include/asm/bitops.h | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h
index a1dd72b16f54..5ff069fe9526 100644
--- a/arch/s390/include/asm/bitops.h
+++ b/arch/s390/include/asm/bitops.h
@@ -132,9 +132,10 @@ static inline bool test_bit_inv(unsigned long nr,
*/
static __always_inline unsigned char __flogr(unsigned long word)
{
- if (__builtin_constant_p(word)) {
- unsigned long bit = 0;
+ unsigned long bit;
+ if (__builtin_constant_p(word)) {
+ bit = 0;
if (!word)
return 64;
if (!(word & 0xffffffff00000000UL)) {
@@ -169,7 +170,14 @@ static __always_inline unsigned char __flogr(unsigned long word)
asm volatile(
" flogr %[rp],%[rp]\n"
: [rp] "+d" (rp.pair) : : "cc");
- return rp.even & 127;
+ bit = rp.even;
+ /*
+ * The result of the flogr instruction is a value in the range
+ * of 0..64. Let the compiler know that the AND operation can
+ * be optimized away.
+ */
+ __assume(bit <= 64);
+ return bit & 127;
}
}
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 4/4] s390/bitops: Cleanup __flogr()
2025-09-16 13:47 [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Heiko Carstens
` (2 preceding siblings ...)
2025-09-16 13:48 ` [PATCH v2 3/4] s390/bitops: Use __assume() for __flogr() inline assembly return value Heiko Carstens
@ 2025-09-16 13:48 ` Heiko Carstens
2025-09-18 14:19 ` [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Alexander Gordeev
4 siblings, 0 replies; 7+ messages in thread
From: Heiko Carstens @ 2025-09-16 13:48 UTC (permalink / raw)
To: Nathan Chancellor, Miguel Ojeda, Kees Cook, Nick Desaulniers,
Vasily Gorbik, Alexander Gordeev, Juergen Christ
Cc: linux-kernel, linux-s390, Sven Schnelle, Christian Borntraeger
The flogr() inline assembly has no side effects and generates the same
output if the input does not change. Therefore remove the volatile
qualifier to allow the compiler to optimize the inline assembly away,
if possible.
Also remove the superfluous '\n' which makes the inline assembly appear
larger than it is according to compiler heuristics (number of lines).
Furthermore change the return type of flogr() to unsigned long and add the
const attribute to the function.
This reduces the kernel image size by 994 bytes (defconfig, gcc 15.2.0).
Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
arch/s390/include/asm/bitops.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h
index 5ff069fe9526..0908bab20d7f 100644
--- a/arch/s390/include/asm/bitops.h
+++ b/arch/s390/include/asm/bitops.h
@@ -130,7 +130,7 @@ static inline bool test_bit_inv(unsigned long nr,
* where the most significant bit has bit number 0.
* If no bit is set this function returns 64.
*/
-static __always_inline unsigned char __flogr(unsigned long word)
+static __always_inline __attribute_const__ unsigned long __flogr(unsigned long word)
{
unsigned long bit;
@@ -167,9 +167,8 @@ static __always_inline unsigned char __flogr(unsigned long word)
union register_pair rp __uninitialized;
rp.even = word;
- asm volatile(
- " flogr %[rp],%[rp]\n"
- : [rp] "+d" (rp.pair) : : "cc");
+ asm("flogr %[rp],%[rp]"
+ : [rp] "+d" (rp.pair) : : "cc");
bit = rp.even;
/*
* The result of the flogr instruction is a value in the range
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/4] compiler_types: Add __assume macro
2025-09-16 13:48 ` [PATCH v2 2/4] compiler_types: Add __assume macro Heiko Carstens
@ 2025-09-17 2:30 ` Nathan Chancellor
0 siblings, 0 replies; 7+ messages in thread
From: Nathan Chancellor @ 2025-09-17 2:30 UTC (permalink / raw)
To: Heiko Carstens
Cc: Miguel Ojeda, Kees Cook, Nick Desaulniers, Vasily Gorbik,
Alexander Gordeev, Juergen Christ, linux-kernel, linux-s390,
Sven Schnelle, Christian Borntraeger
On Tue, Sep 16, 2025 at 03:48:01PM +0200, Heiko Carstens wrote:
> Make the statement attribute "assume" with a new __assume macro available.
>
> The assume attribute is used to indicate that a certain condition is
> assumed to be true. Compilers may or may not use this indication to
> generate optimized code. If this condition is violated at runtime, the
> behavior is undefined.
>
> Note that the clang documentation states that optimizers may react
> differently to this attribute, and this may even have a negative
> performance impact. Therefore this attribute should be used with care.
>
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
LGTM.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
> ---
> include/linux/compiler_types.h | 23 +++++++++++++++++++++++
> init/Kconfig | 10 ++++++++++
> 2 files changed, 33 insertions(+)
>
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index a910f9fa5341..41c16fb8eb40 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -329,6 +329,29 @@ struct ftrace_likely_data {
> #define __no_sanitize_or_inline __always_inline
> #endif
>
> +/*
> + * The assume attribute is used to indicate that a certain condition is
> + * assumed to be true. If this condition is violated at runtime, the behavior
> + * is undefined. Compilers may or may not use this indication to generate
> + * optimized code.
> + *
> + * Note that the clang documentation states that optimizers may react
> + * differently to this attribute, and this may even have a negative
> + * performance impact. Therefore this attribute should be used with care.
> + *
> + * Optional: only supported since gcc >= 13
> + * Optional: only supported since clang >= 19
> + *
> + * gcc: https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html#index-assume-statement-attribute
> + * clang: https://clang.llvm.org/docs/AttributeReference.html#id13
> + *
> + */
> +#ifdef CONFIG_CC_HAS_ASSUME
> +# define __assume(expr) __attribute__((__assume__(expr)))
> +#else
> +# define __assume(expr)
> +#endif
> +
> /*
> * Optional: only supported since gcc >= 15
> * Optional: only supported since clang >= 18
> diff --git a/init/Kconfig b/init/Kconfig
> index 59ae2b967195..935eff59af97 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -112,6 +112,16 @@ config TOOLS_SUPPORT_RELR
> config CC_HAS_ASM_INLINE
> def_bool $(success,echo 'void foo(void) { asm inline (""); }' | $(CC) -x c - -c -o /dev/null)
>
> +config CC_HAS_ASSUME
> + bool
> + # clang needs to be at least 19.1.0 since the meaning of the assume
> + # attribute changed:
> + # https://github.com/llvm/llvm-project/commit/c44fa3e8a9a44c2e9a575768a3c185354b9f6c17
> + default y if CC_IS_CLANG && CLANG_VERSION >= 190100
> + # supported since gcc 13.1.0
> + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106654
> + default y if CC_IS_GCC && GCC_VERSION >= 130100
> +
> config CC_HAS_NO_PROFILE_FN_ATTR
> def_bool $(success,echo '__attribute__((no_profile_instrument_function)) int x();' | $(CC) -x c - -c -o /dev/null -Werror)
>
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly
2025-09-16 13:47 [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Heiko Carstens
` (3 preceding siblings ...)
2025-09-16 13:48 ` [PATCH v2 4/4] s390/bitops: Cleanup __flogr() Heiko Carstens
@ 2025-09-18 14:19 ` Alexander Gordeev
4 siblings, 0 replies; 7+ messages in thread
From: Alexander Gordeev @ 2025-09-18 14:19 UTC (permalink / raw)
To: Heiko Carstens
Cc: Nathan Chancellor, Miguel Ojeda, Kees Cook, Nick Desaulniers,
Vasily Gorbik, Juergen Christ, linux-kernel, linux-s390,
Sven Schnelle, Christian Borntraeger
On Tue, Sep 16, 2025 at 03:47:59PM +0200, Heiko Carstens wrote:
> A recent optimization of the s390 specific ffs() and ffs64()
> implementations leads to a new compiler warning. Instead of reverting the
> optimization address this with the rather new assume attribute, which
> generates even better code, if supported by compilers.
>
> Since the assume attribute may be useful for others as well, add the
> __assume macro to compiler attributes, so it is kernel wide available,
> instead of adding an s390 specific optimization.
>
> Heiko Carstens (4):
> s390/bitops: Limit return value range of __flogr()
> compiler_types: Add __assume macro
> s390/bitops: Use __assume() for __flogr() inline assembly return value
> s390/bitops: Cleanup __flogr()
>
> arch/s390/include/asm/bitops.h | 21 ++++++++++++++-------
> include/linux/compiler_types.h | 23 +++++++++++++++++++++++
> init/Kconfig | 10 ++++++++++
> 3 files changed, 47 insertions(+), 7 deletions(-)
Applied, thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-18 14:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16 13:47 [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 1/4] s390/bitops: Limit return value range of __flogr() Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 2/4] compiler_types: Add __assume macro Heiko Carstens
2025-09-17 2:30 ` Nathan Chancellor
2025-09-16 13:48 ` [PATCH v2 3/4] s390/bitops: Use __assume() for __flogr() inline assembly return value Heiko Carstens
2025-09-16 13:48 ` [PATCH v2 4/4] s390/bitops: Cleanup __flogr() Heiko Carstens
2025-09-18 14:19 ` [PATCH v2 0/4] s390: Fix and optimize __flogr() inline assembly Alexander Gordeev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox