Linux kbuild/kconfig development
 help / color / mirror / Atom feed
* [PATCH v4 0/1] gcov: use -fprofile-update=prefer-atomic with compile-time guard
@ 2026-05-11 10:50 Konstantin Khorenko
  2026-05-11 10:50 ` [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes Konstantin Khorenko
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Khorenko @ 2026-05-11 10:50 UTC (permalink / raw)
  To: Andrew Morton, Arnd Bergmann, Peter Oberparleiter
  Cc: Nathan Chancellor, Nicolas Schier, Mikhail Zaslonko,
	Thomas Weißschuh, Miguel Ojeda, Masahiro Yamada,
	Vasileios Almpanis, Pavel Tikhomirov, linux-kernel, linux-kbuild,
	Konstantin Khorenko

This is v4 of the patch to add -fprofile-update=prefer-atomic to
CFLAGS_GCOV.

v2 was reported to cause link failures on some architecture/config
combinations because GCC emits calls to libatomic runtime functions
for 64-bit atomic counter increments, and the kernel does not link
against libatomic:

  https://lore.kernel.org/all/ff2a4c49-463d-4d8a-9519-bb51308f7ba1@linux.ibm.com/

Arnd Bergmann hit this with GCC-16 randconfig builds:

  x86_64:  undefined reference to `__atomic_fetch_add_8'
  aarch64: undefined reference to `__aarch64_ldadd8_relax'

The kernel test robot confirmed the same on i386-allmodconfig with
GCC 14 (Debian):

  https://lore.kernel.org/all/202605030611.mBKmkPOF-lkp@intel.com/

v3 added a compile-time try-run check that determines whether
-fprofile-update=prefer-atomic is safe to use with the current
compiler and architecture.

=== Approach ===

The check compiles a minimal test program twice using the full
KBUILD_CFLAGS -- once without and once with -fprofile-update=prefer-atomic
-- then compares the undefined symbols in both resulting .o files using
nm.  If prefer-atomic introduces any NEW undefined symbols, the flag is
not added.

Several alternative approaches were considered and rejected:

1) Grepping assembly output for known libatomic symbols
   (__atomic_fetch_add, __aarch64_ldadd, etc):
   Fragile -- requires maintaining a list of arch-specific symbol names.
   New architectures or GCC versions may use different names.

2) Checking nm output for any undefined symbol beyond __gcov_*:
   Fails because KBUILD_CFLAGS adds kernel-specific instrumentation
   (__fentry__, __x86_return_thunk, etc) that creates "expected"
   undefined symbols unrelated to libatomic.

3) Grepping only for "__atomic" in undefined symbols:
   Misses aarch64 outline-atomics symbols (__aarch64_ldadd8_relax)
   which do not contain "atomic" in their name.

4) Filtering KBUILD_CFLAGS to pass only -m32/-m64/-march=* to try-run:
   Brittle whitelist -- misses flags like -mno-outline-atomics on arm64
   and will break when new relevant flags are added.

The chosen diff-based approach is fully architecture-agnostic: it uses
the real KBUILD_CFLAGS, does not depend on knowing libatomic symbol
names, and will not break when new flags or architectures are added.
The only assumption is that -fprofile-update=prefer-atomic should not
introduce any new linker dependencies.

=== Testing ===

Verified on:
  - x86_64, GCC 17.0.0 (trunk 2026-05-09): flag IS added, inline
    lock addq for GCOV counters
  - arm64 cross-compile, GCC 14.1.1 (aarch64-linux-gnu-gcc):
    flag is NOT added (__aarch64_ldadd8_relax detected)

arm64 example showing the try-run detection in action:

  $ echo 'long long x; void f(void){x++;}' | \
    aarch64-linux-gnu-gcc [KBUILD_CFLAGS] -fprofile-arcs \
    -ftest-coverage -c -o base.o
  $ nm base.o | grep ' U '
                   U __gcov_exit
                   U __gcov_init
                   U __gcov_merge_add

  $ echo 'long long x; void f(void){x++;}' | \
    aarch64-linux-gnu-gcc [KBUILD_CFLAGS] -fprofile-arcs \
    -ftest-coverage -fprofile-update=prefer-atomic -c -o test.o
  $ nm test.o | grep ' U '
                   U __aarch64_ldadd8_relax   <-- new, from libatomic
                   U __gcov_exit
                   U __gcov_init
                   U __gcov_merge_add

  The undefined symbols differ => try-run fails => flag not added.

Changes since v3:
  - Moved CFLAGS_GCOV block higher in the Makefile: after the core
    KBUILD_CFLAGS assignments but before scripts/Makefile.gcc-plugins
    include.  Sashiko review of v3 correctly identified that placing
    the try-run after the GCC plugins include would break on clean
    builds when plugin .so files do not yet exist:
    https://sashiko.dev/#/patchset/20260509142216.382205-2-khorenko%40virtuozzo.com

Changes since v2:
  - Added try-run compile-time check (option 3 from Peter's proposal)
  - Moved CFLAGS_GCOV definition after KBUILD_CFLAGS is finalized
  - Split -fprofile-update=prefer-atomic from -fno-tree-loop-im

Konstantin Khorenko (1):
  gcov: use atomic counter updates to fix concurrent access crashes

 Makefile | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes
  2026-05-11 10:50 [PATCH v4 0/1] gcov: use -fprofile-update=prefer-atomic with compile-time guard Konstantin Khorenko
@ 2026-05-11 10:50 ` Konstantin Khorenko
  2026-05-11 11:48   ` Arnd Bergmann
  2026-05-13  9:15   ` Peter Oberparleiter
  0 siblings, 2 replies; 4+ messages in thread
From: Konstantin Khorenko @ 2026-05-11 10:50 UTC (permalink / raw)
  To: Andrew Morton, Arnd Bergmann, Peter Oberparleiter
  Cc: Nathan Chancellor, Nicolas Schier, Mikhail Zaslonko,
	Thomas Weißschuh, Miguel Ojeda, Masahiro Yamada,
	Vasileios Almpanis, Pavel Tikhomirov, linux-kernel, linux-kbuild,
	Konstantin Khorenko

GCC's GCOV instrumentation can merge global branch counters with loop
induction variables as an optimization.  In inflate_fast(), the inner
copy loops get transformed so that the GCOV counter value is loaded
multiple times to compute the loop base address, start index, and end
bound.  Since GCOV counters are global (not per-CPU), concurrent
execution on different CPUs causes the counter to change between loads,
producing inconsistent values and out-of-bounds memory writes.

The crash manifests during IPComp (IP Payload Compression) processing
when inflate_fast() runs concurrently on multiple CPUs:

  BUG: unable to handle page fault for address: ffffd0a3c0902ffa
  RIP: inflate_fast+1431
  Call Trace:
   zlib_inflate
   __deflate_decompress
   crypto_comp_decompress
   ipcomp_decompress [xfrm_ipcomp]
   ipcomp_input [xfrm_ipcomp]
   xfrm_input

At the crash point, the compiler generated three loads from the same
global GCOV counter (__gcov0.inflate_fast+216) to compute base, start,
and end for an indexed loop.  Another CPU modified the counter between
loads, making the values inconsistent - the write went 3.4 MB past a
65 KB buffer.

Add -fprofile-update=prefer-atomic to CFLAGS_GCOV at the global level in
the top-level Makefile, guarded by a try-run compile test.  The test
compiles a minimal program with and without -fprofile-update=prefer-atomic
using the full KBUILD_CFLAGS, then compares undefined symbols in the
resulting object files.  If prefer-atomic introduces new undefined
references (such as __atomic_fetch_add_8 on i386 or
__aarch64_ldadd8_relax on arm64 with outline-atomics), the flag is not
added -- the kernel does not link against libatomic.

On architectures where GCC inlines 64-bit atomic counter updates
(x86_64, s390, ...) the test passes and the flag is enabled, preventing
the compiler from merging counters with loop induction variables and
fixing the observed concurrent-access crash.

On architectures where the flag would introduce libatomic dependencies,
it is silently omitted and behaviour is no worse than before this patch.

Move the CFLAGS_GCOV block from its original position (before the arch
Makefile include) to after the core KBUILD_CFLAGS assignments but before
the scripts/Makefile.gcc-plugins include.
This placement ensures the try-run test sees arch-specific flags
(-m32, -march=, -mno-outline-atomics) while avoiding GCC plugin flags
(-fplugin=) that would break the test on clean builds when plugin shared
objects do not yet exist.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 Makefile | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 9f88dcaae382..2b352ad22d06 100644
--- a/Makefile
+++ b/Makefile
@@ -824,12 +824,6 @@ endif # KBUILD_EXTMOD
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
-CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
-ifdef CONFIG_CC_IS_GCC
-CFLAGS_GCOV	+= -fno-tree-loop-im
-endif
-export CFLAGS_GCOV
-
 # The arch Makefiles can override CC_FLAGS_FTRACE. We may also append it later.
 ifdef CONFIG_FUNCTION_TRACER
   CC_FLAGS_FTRACE := -pg
@@ -1147,6 +1141,27 @@ endif
 # Ensure compilers do not transform certain loops into calls to wcslen()
 KBUILD_CFLAGS += -fno-builtin-wcslen
 
+CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
+ifdef CONFIG_CC_IS_GCC
+CFLAGS_GCOV	+= -fno-tree-loop-im
+# Use atomic counter updates to avoid concurrent-access crashes in GCOV.
+# Only enable if -fprofile-update=prefer-atomic does not introduce new
+# undefined symbols (e.g. libatomic calls that the kernel cannot link).
+CFLAGS_GCOV	+= $(call try-run,\
+	echo 'long long x; void f(void){x++;}' | \
+	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -w -fprofile-arcs \
+	-ftest-coverage -x c - -c -o "$$TMP.base" && \
+	echo 'long long x; void f(void){x++;}' | \
+	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -w -fprofile-arcs \
+	-ftest-coverage -fprofile-update=prefer-atomic \
+	-x c - -c -o "$$TMP" && \
+	$(NM) "$$TMP.base" | grep ' U ' > "$$TMP.ubase" || true ; \
+	$(NM) "$$TMP" | grep ' U ' > "$$TMP.utest" || true ; \
+	cmp -s "$$TMP.ubase" "$$TMP.utest",\
+	-fprofile-update=prefer-atomic)
+endif
+export CFLAGS_GCOV
+
 # change __FILE__ to the relative path to the source directory
 ifdef building_out_of_srctree
 KBUILD_CPPFLAGS += -fmacro-prefix-map=$(srcroot)/=

base-commit: 70390501d1944d4e5b8f7352be180fceb3a44132
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes
  2026-05-11 10:50 ` [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes Konstantin Khorenko
@ 2026-05-11 11:48   ` Arnd Bergmann
  2026-05-13  9:15   ` Peter Oberparleiter
  1 sibling, 0 replies; 4+ messages in thread
From: Arnd Bergmann @ 2026-05-11 11:48 UTC (permalink / raw)
  To: Konstantin Khorenko, Andrew Morton, Peter Oberparleiter
  Cc: Nathan Chancellor, Nicolas Schier, Mikhail Zaslonko,
	Thomas Weißschuh, Miguel Ojeda, Masahiro Yamada,
	Vasileios Almpanis, Pavel Tikhomirov, linux-kernel, linux-kbuild

On Mon, May 11, 2026, at 12:50, Konstantin Khorenko wrote:
> On architectures where GCC inlines 64-bit atomic counter updates
> (x86_64, s390, ...) the test passes and the flag is enabled, preventing
> the compiler from merging counters with loop induction variables and
> fixing the observed concurrent-access crash.
>
> On architectures where the flag would introduce libatomic dependencies,
> it is silently omitted and behaviour is no worse than before this patch.

Tested-by: Arnd Bergmann <arnd@arndb.de>

I've checked this with my cross compilers now, and found the
external function calls on aarch64, csky, and hppa. On x86-64,
the trivial scripts did not reproduce that unless I also pass
"-m32 -msoft-float -mno-sse", which is what you get from 
KBUILD_CFLAGS.

      Arnd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes
  2026-05-11 10:50 ` [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes Konstantin Khorenko
  2026-05-11 11:48   ` Arnd Bergmann
@ 2026-05-13  9:15   ` Peter Oberparleiter
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Oberparleiter @ 2026-05-13  9:15 UTC (permalink / raw)
  To: Konstantin Khorenko, Andrew Morton, Arnd Bergmann
  Cc: Nathan Chancellor, Nicolas Schier, Mikhail Zaslonko,
	Thomas Weißschuh, Miguel Ojeda, Masahiro Yamada,
	Vasileios Almpanis, Pavel Tikhomirov, linux-kernel, linux-kbuild

On 11.05.2026 12:50, Konstantin Khorenko wrote:
> GCC's GCOV instrumentation can merge global branch counters with loop
> induction variables as an optimization.  In inflate_fast(), the inner
> copy loops get transformed so that the GCOV counter value is loaded
> multiple times to compute the loop base address, start index, and end
> bound.  Since GCOV counters are global (not per-CPU), concurrent
> execution on different CPUs causes the counter to change between loads,
> producing inconsistent values and out-of-bounds memory writes.
> 
> The crash manifests during IPComp (IP Payload Compression) processing
> when inflate_fast() runs concurrently on multiple CPUs:
> 
>   BUG: unable to handle page fault for address: ffffd0a3c0902ffa
>   RIP: inflate_fast+1431
>   Call Trace:
>    zlib_inflate
>    __deflate_decompress
>    crypto_comp_decompress
>    ipcomp_decompress [xfrm_ipcomp]
>    ipcomp_input [xfrm_ipcomp]
>    xfrm_input
> 
> At the crash point, the compiler generated three loads from the same
> global GCOV counter (__gcov0.inflate_fast+216) to compute base, start,
> and end for an indexed loop.  Another CPU modified the counter between
> loads, making the values inconsistent - the write went 3.4 MB past a
> 65 KB buffer.
> 
> Add -fprofile-update=prefer-atomic to CFLAGS_GCOV at the global level in
> the top-level Makefile, guarded by a try-run compile test.  The test
> compiles a minimal program with and without -fprofile-update=prefer-atomic
> using the full KBUILD_CFLAGS, then compares undefined symbols in the
> resulting object files.  If prefer-atomic introduces new undefined
> references (such as __atomic_fetch_add_8 on i386 or
> __aarch64_ldadd8_relax on arm64 with outline-atomics), the flag is not
> added -- the kernel does not link against libatomic.
> 
> On architectures where GCC inlines 64-bit atomic counter updates
> (x86_64, s390, ...) the test passes and the flag is enabled, preventing
> the compiler from merging counters with loop induction variables and
> fixing the observed concurrent-access crash.
> 
> On architectures where the flag would introduce libatomic dependencies,
> it is silently omitted and behaviour is no worse than before this patch.
> 
> Move the CFLAGS_GCOV block from its original position (before the arch
> Makefile include) to after the core KBUILD_CFLAGS assignments but before
> the scripts/Makefile.gcc-plugins include.
> This placement ensures the try-run test sees arch-specific flags
> (-m32, -march=, -mno-outline-atomics) while avoiding GCC plugin flags
> (-fplugin=) that would break the test on clean builds when plugin shared
> objects do not yet exist.
> 
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

Thank you for providing this updated fix - I greatly appreciate you
working on this.

I tested this patch successfully on s390 with GCC 15.2 and GCC 16.1.

Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>

That said, I found some stylistic details where the patch could be
improved that I've described below for consideration/comments. But I
don't want to delay the fix any longer than necessary, therefore I'm
fine with this patch, even when not including my suggestions.

> ---
>  Makefile | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 9f88dcaae382..2b352ad22d06 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -824,12 +824,6 @@ endif # KBUILD_EXTMOD
>  # Defaults to vmlinux, but the arch makefile usually adds further targets
>  all: vmlinux
>  
> -CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
> -ifdef CONFIG_CC_IS_GCC
> -CFLAGS_GCOV	+= -fno-tree-loop-im
> -endif
> -export CFLAGS_GCOV
> -
>  # The arch Makefiles can override CC_FLAGS_FTRACE. We may also append it later.
>  ifdef CONFIG_FUNCTION_TRACER
>    CC_FLAGS_FTRACE := -pg
> @@ -1147,6 +1141,27 @@ endif
>  # Ensure compilers do not transform certain loops into calls to wcslen()
>  KBUILD_CFLAGS += -fno-builtin-wcslen
>  
> +CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
> +ifdef CONFIG_CC_IS_GCC
> +CFLAGS_GCOV	+= -fno-tree-loop-im
> +# Use atomic counter updates to avoid concurrent-access crashes in GCOV.
> +# Only enable if -fprofile-update=prefer-atomic does not introduce new
> +# undefined symbols (e.g. libatomic calls that the kernel cannot link).
> +CFLAGS_GCOV	+= $(call try-run,\
> +	echo 'long long x; void f(void){x++;}' | \
> +	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -w -fprofile-arcs \
> +	-ftest-coverage -x c - -c -o "$$TMP.base" && \
> +	echo 'long long x; void f(void){x++;}' | \
> +	$(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -w -fprofile-arcs \
> +	-ftest-coverage -fprofile-update=prefer-atomic \

Using $(CFLAGS_GCOV) instead of listing -fprofile-arcs -ftest-coverage
separately would reduce duplication, automatically account for potential
future changes in the list of flags used for GCOV profiling, as well as
cover any side-effects that the third flag -fno-tree-loop-im might
have/get in the future.

> +	-x c - -c -o "$$TMP" && \
> +	$(NM) "$$TMP.base" | grep ' U ' > "$$TMP.ubase" || true ; \
> +	$(NM) "$$TMP" | grep ' U ' > "$$TMP.utest" || true ; \
> +	cmp -s "$$TMP.ubase" "$$TMP.utest",\
> +	-fprofile-update=prefer-atomic)
> +endif
> +export CFLAGS_GCOV
> +

Not directly related with the functional aspects of this patch, but
since all GCOV-related Makefile portions are touched/moved this would be
a good opportunity to follow the pattern of other instrumentation
mechanisms like KCOV, KASAN, etc., that is:

- split out the CFLAGS_GCOV chunk above into a new scripts/Makefile.gcov
  file
- include this file depending on the value of CONFIG_GCOV_KERNEL via
  include-$(CONFIG_GCOV_KERNEL) += scripts/Makefile.gcov
  in the "additional Makefile" section that follows after this
  Makefile position


-- 
Peter Oberparleiter
Linux on IBM Z Development - IBM Germany R&D

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-13  9:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 10:50 [PATCH v4 0/1] gcov: use -fprofile-update=prefer-atomic with compile-time guard Konstantin Khorenko
2026-05-11 10:50 ` [PATCH v4 1/1] gcov: use atomic counter updates to fix concurrent access crashes Konstantin Khorenko
2026-05-11 11:48   ` Arnd Bergmann
2026-05-13  9:15   ` Peter Oberparleiter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox