* [PATCH v5 1/7] Add AutoFDO support for Clang build
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-11-01 18:01 ` Masahiro Yamada
2024-10-23 22:44 ` [PATCH v5 2/7] objtool: Fix unreachable instruction warnings for weak functions Rong Xu
` (6 subsequent siblings)
7 siblings, 1 reply; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
Add the build support for using Clang's AutoFDO. Building the kernel
with AutoFDO does not reduce the optimization level from the
compiler. AutoFDO uses hardware sampling to gather information about
the frequency of execution of different code paths within a binary.
This information is then used to guide the compiler's optimization
decisions, resulting in a more efficient binary. Experiments
showed that the kernel can improve up to 10% in latency.
The support requires a Clang compiler after LLVM 17. This submission
is limited to x86 platforms that support PMU features like LBR on
Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1,
and BRBE on ARM 1 is part of planned future work.
Here is an example workflow for AutoFDO kernel:
1) Build the kernel on the host machine with LLVM enabled, for example,
$ make menuconfig LLVM=1
Turn on AutoFDO build config:
CONFIG_AUTOFDO_CLANG=y
With a configuration that has LLVM enabled, use the following
command:
scripts/config -e AUTOFDO_CLANG
After getting the config, build with
$ make LLVM=1
2) Install the kernel on the test machine.
3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number,
like 500009, for this purpose.
For Intel platforms:
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
-o <perf_file> -- <loadtest>
For AMD platforms:
The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
For Zen3:
$ cat proc/cpuinfo | grep " brs"
For Zen4:
$ cat proc/cpuinfo | grep amd_lbr_v2
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c <count> -o <perf_file> -- <loadtest>
4) (Optional) Download the raw perf file to the host machine.
5) To generate an AutoFDO profile, two offline tools are available:
create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
of the AutoFDO project and can be found on GitHub
(https://github.com/google/autofdo), version v0.30.1 or later. The
llvm_profgen tool is included in the LLVM compiler itself. It's
important to note that the version of llvm_profgen doesn't need to
match the version of Clang. It needs to be the LLVM 19 release or
later, or from the LLVM trunk.
$ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \
-o <profile_file>
or
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
--format=extbinary --out=<profile_file>
Note that multiple AutoFDO profile files can be merged into one via:
$ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n>
6) Rebuild the kernel using the AutoFDO profile file with the same config
as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled):
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
Documentation/dev-tools/autofdo.rst | 167 ++++++++++++++++++++++++++++
Documentation/dev-tools/index.rst | 1 +
MAINTAINERS | 7 ++
Makefile | 1 +
arch/Kconfig | 20 ++++
arch/x86/Kconfig | 1 +
scripts/Makefile.autofdo | 22 ++++
scripts/Makefile.lib | 10 ++
tools/objtool/check.c | 1 +
9 files changed, 230 insertions(+)
create mode 100644 Documentation/dev-tools/autofdo.rst
create mode 100644 scripts/Makefile.autofdo
diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
new file mode 100644
index 000000000000..9d90e6d79781
--- /dev/null
+++ b/Documentation/dev-tools/autofdo.rst
@@ -0,0 +1,167 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Using AutoFDO with the Linux kernel
+===================================
+
+This enables AutoFDO build support for the kernel when using
+the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
+is a type of profile-guided optimization (PGO) used to enhance the
+performance of binary executables. It gathers information about the
+frequency of execution of various code paths within a binary using
+hardware sampling. This data is then used to guide the compiler's
+optimization decisions, resulting in a more efficient binary. AutoFDO
+is a powerful optimization technique, and data indicates that it can
+significantly improve kernel performance. It's especially beneficial
+for workloads affected by front-end stalls.
+
+For AutoFDO builds, unlike non-FDO builds, the user must supply a
+profile. Acquiring an AutoFDO profile can be done in several ways.
+AutoFDO profiles are created by converting hardware sampling using
+the "perf" tool. It is crucial that the workload used to create these
+perf files is representative; they must exhibit runtime
+characteristics similar to the workloads that are intended to be
+optimized. Failure to do so will result in the compiler optimizing
+for the wrong objective.
+
+The AutoFDO profile often encapsulates the program's behavior. If the
+performance-critical codes are architecture-independent, the profile
+can be applied across platforms to achieve performance gains. For
+instance, using the profile generated on Intel architecture to build
+a kernel for AMD architecture can also yield performance improvements.
+
+There are two methods for acquiring a representative profile:
+(1) Sample real workloads using a production environment.
+(2) Generate the profile using a representative load test.
+When enabling the AutoFDO build configuration without providing an
+AutoFDO profile, the compiler only modifies the dwarf information in
+the kernel without impacting runtime performance. It's advisable to
+use a kernel binary built with the same AutoFDO configuration to
+collect the perf profile. While it's possible to use a kernel built
+with different options, it may result in inferior performance.
+
+One can collect profiles using AutoFDO build for the previous kernel.
+AutoFDO employs relative line numbers to match the profiles, offering
+some tolerance for source changes. This mode is commonly used in a
+production environment for profile collection.
+
+In a profile collection based on a load test, the AutoFDO collection
+process consists of the following steps:
+
+#. Initial build: The kernel is built with AutoFDO options
+ without a profile.
+
+#. Profiling: The above kernel is then run with a representative
+ workload to gather execution frequency data. This data is
+ collected using hardware sampling, via perf. AutoFDO is most
+ effective on platforms supporting advanced PMU features like
+ LBR on Intel machines.
+
+#. AutoFDO profile generation: Perf output file is converted to
+ the AutoFDO profile via offline tools.
+
+The support requires a Clang compiler LLVM 17 or later.
+
+Preparation
+===========
+
+Configure the kernel with::
+
+ CONFIG_AUTOFDO_CLANG=y
+
+Customization
+=============
+
+The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
+AutoFDO builds. One can, however, enable or disable AutoFDO build for
+individual files and directories by adding a line similar to the following
+to the respective kernel Makefile:
+
+- For enabling a single file (e.g. foo.o) ::
+
+ AUTOFDO_PROFILE_foo.o := y
+
+- For enabling all files in one directory ::
+
+ AUTOFDO_PROFILE := y
+
+- For disabling one file ::
+
+ AUTOFDO_PROFILE_foo.o := n
+
+- For disabling all files in one directory ::
+
+ AUTOFDO_PROFILE := n
+
+Workflow
+========
+
+Here is an example workflow for AutoFDO kernel:
+
+1) Build the kernel on the host machine with LLVM enabled,
+ for example, ::
+
+ $ make menuconfig LLVM=1
+
+ Turn on AutoFDO build config::
+
+ CONFIG_AUTOFDO_CLANG=y
+
+ With a configuration that with LLVM enabled, use the following command::
+
+ $ scripts/config -e AUTOFDO_CLANG
+
+ After getting the config, build with ::
+
+ $ make LLVM=1
+
+2) Install the kernel on the test machine.
+
+3) Run the load tests. The '-c' option in perf specifies the sample
+ event period. We suggest using a suitable prime number, like 500009,
+ for this purpose.
+
+ - For Intel platforms::
+
+ $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+
+ - For AMD platforms::
+ The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
+ For Zen3::
+
+ $ cat proc/cpuinfo | grep " brs"
+
+ For Zen4::
+
+ $ cat proc/cpuinfo | grep amd_lbr_v2
+
+ The following command generated the perf data file::
+
+ $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+
+4) (Optional) Download the raw perf file to the host machine.
+
+5) To generate an AutoFDO profile, two offline tools are available:
+ create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
+ of the AutoFDO project and can be found on GitHub
+ (https://github.com/google/autofdo), version v0.30.1 or later.
+ The llvm_profgen tool is included in the LLVM compiler itself. It's
+ important to note that the version of llvm_profgen doesn't need to match
+ the version of Clang. It needs to be the LLVM 19 release of Clang
+ or later, or just from the LLVM trunk. ::
+
+ $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
+
+ or ::
+
+ $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
+
+ Note that multiple AutoFDO profile files can be merged into one via::
+
+ $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
+
+6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
+ (Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
+
+ $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
+
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 53d4d124f9c5..6945644f7008 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -34,6 +34,7 @@ Documentation/dev-tools/testing-overview.rst
ktap
checkuapi
gpio-sloppy-logic-analyzer
+ autofdo
.. only:: subproject and html
diff --git a/MAINTAINERS b/MAINTAINERS
index d01256208c9f..1b8db863031f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3678,6 +3678,13 @@ F: kernel/audit*
F: lib/*audit.c
K: \baudit_[a-z_0-9]\+\b
+AUTOFDO BUILD
+M: Rong Xu <xur@google.com>
+M: Han Shen <shenhan@google.com>
+S: Supported
+F: Documentation/dev-tools/autofdo.rst
+F: scripts/Makefile.autofdo
+
AUXILIARY BUS DRIVER
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
R: Dave Ertman <david.m.ertman@intel.com>
diff --git a/Makefile b/Makefile
index c5493c0c0ca1..bbb6ec68f5dc 100644
--- a/Makefile
+++ b/Makefile
@@ -1018,6 +1018,7 @@ include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
+include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
include $(addprefix $(srctree)/, $(include-y))
diff --git a/arch/Kconfig b/arch/Kconfig
index 8af374ea1adc..5e9604960cbb 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -811,6 +811,26 @@ config LTO_CLANG_THIN
If unsure, say Y.
endchoice
+config ARCH_SUPPORTS_AUTOFDO_CLANG
+ bool
+
+config AUTOFDO_CLANG
+ bool "Enable Clang's AutoFDO build (EXPERIMENTAL)"
+ depends on ARCH_SUPPORTS_AUTOFDO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 170000
+ help
+ This option enables Clang’s AutoFDO build. When
+ an AutoFDO profile is specified in variable
+ CLANG_AUTOFDO_PROFILE during the build process,
+ Clang uses the profile to optimize the kernel.
+
+ If no profile is specified, AutoFDO options are
+ still passed to Clang to facilitate the collection
+ of perf data for creating an AutoFDO profile in
+ subsequent builds.
+
+ If unsure, say N.
+
config ARCH_SUPPORTS_CFI_CLANG
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2852fcd82cbd..503a0268155a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -126,6 +126,7 @@ config X86
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_RT
+ select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
select ARCH_USE_MEMTEST
diff --git a/scripts/Makefile.autofdo b/scripts/Makefile.autofdo
new file mode 100644
index 000000000000..ff96a63fea7c
--- /dev/null
+++ b/scripts/Makefile.autofdo
@@ -0,0 +1,22 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# Enable available and selected Clang AutoFDO features.
+
+CFLAGS_AUTOFDO_CLANG := -fdebug-info-for-profiling -mllvm -enable-fs-discriminator=true -mllvm -improved-fs-discriminator=true
+
+ifndef CONFIG_DEBUG_INFO
+ CFLAGS_AUTOFDO_CLANG += -gmlt
+endif
+
+ifdef CLANG_AUTOFDO_PROFILE
+ CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE)
+endif
+
+ifdef CONFIG_LTO_CLANG_THIN
+ ifdef CLANG_AUTOFDO_PROFILE
+ KBUILD_LDFLAGS += --lto-sample-profile=$(CLANG_AUTOFDO_PROFILE)
+ endif
+ KBUILD_LDFLAGS += --mllvm=-enable-fs-discriminator=true --mllvm=-improved-fs-discriminator=true -plugin-opt=thinlto
+endif
+
+export CFLAGS_AUTOFDO_CLANG
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 01a9f567d5af..2d0942c1a027 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -191,6 +191,16 @@ _c_flags += $(if $(patsubst n%,, \
-D__KCSAN_INSTRUMENT_BARRIERS__)
endif
+#
+# Enable AutoFDO build flags except some files or directories we don't want to
+# enable (depends on variables AUTOFDO_PROFILE_obj.o and AUTOFDO_PROFILE).
+#
+ifeq ($(CONFIG_AUTOFDO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(is-kernel-object)), \
+ $(CFLAGS_AUTOFDO_CLANG))
+endif
+
# $(src) for including checkin headers from generated source files
# $(obj) for including generated headers from checkin source files
ifeq ($(KBUILD_EXTMOD),)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 6604f5d038aa..4c5229991e1e 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -4557,6 +4557,7 @@ static int validate_ibt(struct objtool_file *file)
!strcmp(sec->name, "__jump_table") ||
!strcmp(sec->name, "__mcount_loc") ||
!strcmp(sec->name, ".kcfi_traps") ||
+ !strcmp(sec->name, ".llvm.call-graph-profile") ||
strstr(sec->name, "__patchable_function_entries"))
continue;
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 1/7] Add AutoFDO support for Clang build
2024-10-23 22:44 ` [PATCH v5 1/7] Add AutoFDO " Rong Xu
@ 2024-11-01 18:01 ` Masahiro Yamada
2024-11-01 20:08 ` Rong Xu
0 siblings, 1 reply; 11+ messages in thread
From: Masahiro Yamada @ 2024-11-01 18:01 UTC (permalink / raw)
To: Rong Xu
Cc: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Mike Rapoport (IBM), Nathan Chancellor,
Nick Desaulniers, Nicolas Schier, Paul E. McKenney,
Peter Zijlstra, Sami Tolvanen, Thomas Gleixner, Wei Yang,
workflows, Miguel Ojeda, Maksim Panchenko, Yonghong Song,
Yabin Cui, Krzysztof Pszeniczny, Sriraman Tallam,
Stephane Eranian, x86, linux-arch, linux-doc, linux-kbuild,
linux-kernel, llvm
On Thu, Oct 24, 2024 at 7:44 AM Rong Xu <xur@google.com> wrote:
>
> Add the build support for using Clang's AutoFDO. Building the kernel
> with AutoFDO does not reduce the optimization level from the
> compiler. AutoFDO uses hardware sampling to gather information about
> the frequency of execution of different code paths within a binary.
> This information is then used to guide the compiler's optimization
> decisions, resulting in a more efficient binary. Experiments
> showed that the kernel can improve up to 10% in latency.
>
> The support requires a Clang compiler after LLVM 17. This submission
> is limited to x86 platforms that support PMU features like LBR on
> Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1,
> and BRBE on ARM 1 is part of planned future work.
>
> Here is an example workflow for AutoFDO kernel:
>
> 1) Build the kernel on the host machine with LLVM enabled, for example,
> $ make menuconfig LLVM=1
> Turn on AutoFDO build config:
> CONFIG_AUTOFDO_CLANG=y
> With a configuration that has LLVM enabled, use the following
> command:
> scripts/config -e AUTOFDO_CLANG
> After getting the config, build with
> $ make LLVM=1
>
> 2) Install the kernel on the test machine.
>
> 3) Run the load tests. The '-c' option in perf specifies the sample
> event period. We suggest using a suitable prime number,
> like 500009, for this purpose.
> For Intel platforms:
> $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
> -o <perf_file> -- <loadtest>
> For AMD platforms:
> The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
> For Zen3:
> $ cat proc/cpuinfo | grep " brs"
> For Zen4:
> $ cat proc/cpuinfo | grep amd_lbr_v2
> $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
> -N -b -c <count> -o <perf_file> -- <loadtest>
>
> 4) (Optional) Download the raw perf file to the host machine.
>
> 5) To generate an AutoFDO profile, two offline tools are available:
> create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
> of the AutoFDO project and can be found on GitHub
> (https://github.com/google/autofdo), version v0.30.1 or later. The
> llvm_profgen tool is included in the LLVM compiler itself. It's
> important to note that the version of llvm_profgen doesn't need to
> match the version of Clang. It needs to be the LLVM 19 release or
> later, or from the LLVM trunk.
> $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \
> -o <profile_file>
> or
> $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
> --format=extbinary --out=<profile_file>
>
> Note that multiple AutoFDO profile files can be merged into one via:
> $ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n>
>
> 6) Rebuild the kernel using the AutoFDO profile file with the same config
> as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled):
> $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
>
> Co-developed-by: Han Shen <shenhan@google.com>
> Signed-off-by: Han Shen <shenhan@google.com>
> Signed-off-by: Rong Xu <xur@google.com>
> Suggested-by: Sriraman Tallam <tmsriram@google.com>
> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
> Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
> Suggested-by: Stephane Eranian <eranian@google.com>
> Tested-by: Yonghong Song <yonghong.song@linux.dev>
> +Workflow
> +========
> +
> +Here is an example workflow for AutoFDO kernel:
> +
> +1) Build the kernel on the host machine with LLVM enabled,
> + for example, ::
> +
> + $ make menuconfig LLVM=1
> +
> + Turn on AutoFDO build config::
> +
> + CONFIG_AUTOFDO_CLANG=y
> +
> + With a configuration that with LLVM enabled, use the following command::
> +
> + $ scripts/config -e AUTOFDO_CLANG
> +
> + After getting the config, build with ::
> +
> + $ make LLVM=1
> +
> +2) Install the kernel on the test machine.
> +
> +3) Run the load tests. The '-c' option in perf specifies the sample
> + event period. We suggest using a suitable prime number, like 500009,
> + for this purpose.
> +
> + - For Intel platforms::
> +
> + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> +
> + - For AMD platforms::
I am not sure if this double-colon is needed
when the next line is not code.
> + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
> + For Zen3::
> +
> + $ cat proc/cpuinfo | grep " brs"
> +
> + For Zen4::
> +
> + $ cat proc/cpuinfo | grep amd_lbr_v2
> +
> + The following command generated the perf data file::
> +
> + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> +
> +4) (Optional) Download the raw perf file to the host machine.
> +
> +5) To generate an AutoFDO profile, two offline tools are available:
> + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
> + of the AutoFDO project and can be found on GitHub
> + (https://github.com/google/autofdo), version v0.30.1 or later.
> + The llvm_profgen tool is included in the LLVM compiler itself. It's
> + important to note that the version of llvm_profgen doesn't need to match
> + the version of Clang. It needs to be the LLVM 19 release of Clang
> + or later, or just from the LLVM trunk. ::
> +
> + $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
> +
> + or ::
> +
> + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
> +
> + Note that multiple AutoFDO profile files can be merged into one via::
> +
> + $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
> +
> +6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
> + (Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
> +
> + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
> +
Trailing blank line.
.git/rebase-apply/patch:187: new blank line at EOF.
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 1/7] Add AutoFDO support for Clang build
2024-11-01 18:01 ` Masahiro Yamada
@ 2024-11-01 20:08 ` Rong Xu
0 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-11-01 20:08 UTC (permalink / raw)
To: Masahiro Yamada
Cc: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Mike Rapoport (IBM), Nathan Chancellor,
Nick Desaulniers, Nicolas Schier, Paul E. McKenney,
Peter Zijlstra, Sami Tolvanen, Thomas Gleixner, Wei Yang,
workflows, Miguel Ojeda, Maksim Panchenko, Yonghong Song,
Yabin Cui, Krzysztof Pszeniczny, Sriraman Tallam,
Stephane Eranian, x86, linux-arch, linux-doc, linux-kbuild,
linux-kernel, llvm
On Fri, Nov 1, 2024 at 11:02 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Thu, Oct 24, 2024 at 7:44 AM Rong Xu <xur@google.com> wrote:
> >
> > Add the build support for using Clang's AutoFDO. Building the kernel
> > with AutoFDO does not reduce the optimization level from the
> > compiler. AutoFDO uses hardware sampling to gather information about
> > the frequency of execution of different code paths within a binary.
> > This information is then used to guide the compiler's optimization
> > decisions, resulting in a more efficient binary. Experiments
> > showed that the kernel can improve up to 10% in latency.
> >
> > The support requires a Clang compiler after LLVM 17. This submission
> > is limited to x86 platforms that support PMU features like LBR on
> > Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1,
> > and BRBE on ARM 1 is part of planned future work.
> >
> > Here is an example workflow for AutoFDO kernel:
> >
> > 1) Build the kernel on the host machine with LLVM enabled, for example,
> > $ make menuconfig LLVM=1
> > Turn on AutoFDO build config:
> > CONFIG_AUTOFDO_CLANG=y
> > With a configuration that has LLVM enabled, use the following
> > command:
> > scripts/config -e AUTOFDO_CLANG
> > After getting the config, build with
> > $ make LLVM=1
> >
> > 2) Install the kernel on the test machine.
> >
> > 3) Run the load tests. The '-c' option in perf specifies the sample
> > event period. We suggest using a suitable prime number,
> > like 500009, for this purpose.
> > For Intel platforms:
> > $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
> > -o <perf_file> -- <loadtest>
> > For AMD platforms:
> > The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
> > For Zen3:
> > $ cat proc/cpuinfo | grep " brs"
> > For Zen4:
> > $ cat proc/cpuinfo | grep amd_lbr_v2
> > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
> > -N -b -c <count> -o <perf_file> -- <loadtest>
> >
> > 4) (Optional) Download the raw perf file to the host machine.
> >
> > 5) To generate an AutoFDO profile, two offline tools are available:
> > create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
> > of the AutoFDO project and can be found on GitHub
> > (https://github.com/google/autofdo), version v0.30.1 or later. The
> > llvm_profgen tool is included in the LLVM compiler itself. It's
> > important to note that the version of llvm_profgen doesn't need to
> > match the version of Clang. It needs to be the LLVM 19 release or
> > later, or from the LLVM trunk.
> > $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \
> > -o <profile_file>
> > or
> > $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
> > --format=extbinary --out=<profile_file>
> >
> > Note that multiple AutoFDO profile files can be merged into one via:
> > $ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n>
> >
> > 6) Rebuild the kernel using the AutoFDO profile file with the same config
> > as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled):
> > $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
> >
> > Co-developed-by: Han Shen <shenhan@google.com>
> > Signed-off-by: Han Shen <shenhan@google.com>
> > Signed-off-by: Rong Xu <xur@google.com>
> > Suggested-by: Sriraman Tallam <tmsriram@google.com>
> > Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
> > Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
> > Suggested-by: Stephane Eranian <eranian@google.com>
> > Tested-by: Yonghong Song <yonghong.song@linux.dev>
>
>
>
>
> > +Workflow
> > +========
> > +
> > +Here is an example workflow for AutoFDO kernel:
> > +
> > +1) Build the kernel on the host machine with LLVM enabled,
> > + for example, ::
> > +
> > + $ make menuconfig LLVM=1
> > +
> > + Turn on AutoFDO build config::
> > +
> > + CONFIG_AUTOFDO_CLANG=y
> > +
> > + With a configuration that with LLVM enabled, use the following command::
> > +
> > + $ scripts/config -e AUTOFDO_CLANG
> > +
> > + After getting the config, build with ::
> > +
> > + $ make LLVM=1
> > +
> > +2) Install the kernel on the test machine.
> > +
> > +3) Run the load tests. The '-c' option in perf specifies the sample
> > + event period. We suggest using a suitable prime number, like 500009,
> > + for this purpose.
> > +
> > + - For Intel platforms::
> > +
> > + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> > +
> > + - For AMD platforms::
>
> I am not sure if this double-colon is needed
> when the next line is not code.
Thanks for catching this. We don't mean to use "::" here. It should be
":" and there is supposed to be a blank line after this.
Also a blank line before "For Zen3::". I will fix this in the patch.
>
>
>
> > + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
> > + For Zen3::
> > +
> > + $ cat proc/cpuinfo | grep " brs"
> > +
> > + For Zen4::
> > +
> > + $ cat proc/cpuinfo | grep amd_lbr_v2
> > +
> > + The following command generated the perf data file::
> > +
> > + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> > +
> > +4) (Optional) Download the raw perf file to the host machine.
> > +
> > +5) To generate an AutoFDO profile, two offline tools are available:
> > + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
> > + of the AutoFDO project and can be found on GitHub
> > + (https://github.com/google/autofdo), version v0.30.1 or later.
> > + The llvm_profgen tool is included in the LLVM compiler itself. It's
> > + important to note that the version of llvm_profgen doesn't need to match
> > + the version of Clang. It needs to be the LLVM 19 release of Clang
> > + or later, or just from the LLVM trunk. ::
> > +
> > + $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
> > +
> > + or ::
> > +
> > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
> > +
> > + Note that multiple AutoFDO profile files can be merged into one via::
> > +
> > + $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
> > +
> > +6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
> > + (Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
> > +
> > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
> > +
>
> Trailing blank line.
>
> .git/rebase-apply/patch:187: new blank line at EOF.
Will remote the blank line.
>
>
>
>
>
> --
> Best Regards
> Masahiro Yamada
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v5 2/7] objtool: Fix unreachable instruction warnings for weak functions
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
2024-10-23 22:44 ` [PATCH v5 1/7] Add AutoFDO " Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-10-23 22:44 ` [PATCH v5 3/7] Change the symbols order when --ffunction-sections is enabled Rong Xu
` (5 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
In the presence of both weak and strong function definitions, the
linker drops the weak symbol in favor of a strong symbol, but
leaves the code in place. Code in ignore_unreachable_insn() has
some heuristics to suppress the warning, but it does not work when
-ffunction-sections is enabled.
Suppose function foo has both strong and weak definitions.
Case 1: The strong definition has an annotated section name,
like .init.text. Only the weak definition will be placed into
.text.foo. But since the section has no symbols, there will be no
"hole" in the section.
Case 2: Both sections are without an annotated section name.
Both will be placed into .text.foo section, but there will be only one
symbol (the strong one). If the weak code is before the strong code,
there is no "hole" as it fails to find the right-most symbol before
the offset.
The fix is to use the first node to compute the hole if hole.sym
is empty. If there is no symbol in the section, the first node
will be NULL, in which case, -1 is returned to skip the whole
section.
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
tools/objtool/elf.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 3d27983dc908..6f64d611faea 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -224,12 +224,17 @@ int find_symbol_hole_containing(const struct section *sec, unsigned long offset)
if (n)
return 0; /* not a hole */
- /* didn't find a symbol for which @offset is after it */
- if (!hole.sym)
- return 0; /* not a hole */
+ /*
+ * @offset >= sym->offset + sym->len, find symbol after it.
+ * When hole.sym is empty, use the first node to compute the hole.
+ * If there is no symbol in the section, the first node will be NULL,
+ * in which case, -1 is returned to skip the whole section.
+ */
+ if (hole.sym)
+ n = rb_next(&hole.sym->node);
+ else
+ n = rb_first_cached(&sec->symbol_tree);
- /* @offset >= sym->offset + sym->len, find symbol after it */
- n = rb_next(&hole.sym->node);
if (!n)
return -1; /* until end of address space */
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 3/7] Change the symbols order when --ffunction-sections is enabled
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
2024-10-23 22:44 ` [PATCH v5 1/7] Add AutoFDO " Rong Xu
2024-10-23 22:44 ` [PATCH v5 2/7] objtool: Fix unreachable instruction warnings for weak functions Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-10-23 22:44 ` [PATCH v5 4/7] Add markers for text_unlikely and text_hot sections Rong Xu
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
When the -ffunction-sections compiler option is enabled, each function
is placed in a separate section named .text.function_name rather than
putting all functions in a single .text section.
However, using -function-sections can cause problems with the
linker script. The comments included in include/asm-generic/vmlinux.lds.h
note these issues.:
“TEXT_MAIN here will match .text.fixup and .text.unlikely if dead
code elimination is enabled, so these sections should be converted
to use ".." first.”
It is unclear whether there is a straightforward method for converting
a suffix to "..".
This patch modifies the order of subsections within the text output
section. Specifically, it repositions sections with certain fixed patterns
(for example .text.unlikely) before TEXT_MAIN, ensuring that they are
grouped and matched together. It also places .text.hot section at the
beginning of a page to help the TLB performance.
Note that the limitation arises because the linker script employs glob
patterns instead of regular expressions for string matching. While there
is a method to maintain the current order using complex patterns, this
significantly complicates the pattern and increases the likelihood of
errors.
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
include/asm-generic/vmlinux.lds.h | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index eeadbaeccf88..fd901951549c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -553,19 +553,24 @@
* .text section. Map to function alignment to avoid address changes
* during second ld run in second ld pass when generating System.map
*
- * TEXT_MAIN here will match .text.fixup and .text.unlikely if dead
- * code elimination is enabled, so these sections should be converted
- * to use ".." first.
+ * TEXT_MAIN here will match symbols with a fixed pattern (for example,
+ * .text.hot or .text.unlikely) if dead code elimination or
+ * function-section is enabled. Match these symbols first before
+ * TEXT_MAIN to ensure they are grouped together.
+ *
+ * Also placing .text.hot section at the beginning of a page, this
+ * would help the TLB performance.
*/
#define TEXT_TEXT \
ALIGN_FUNCTION(); \
+ *(.text.asan.* .text.tsan.*) \
+ *(.text.unknown .text.unknown.*) \
+ *(.text.unlikely .text.unlikely.*) \
+ . = ALIGN(PAGE_SIZE); \
*(.text.hot .text.hot.*) \
*(TEXT_MAIN .text.fixup) \
- *(.text.unlikely .text.unlikely.*) \
- *(.text.unknown .text.unknown.*) \
NOINSTR_TEXT \
- *(.ref.text) \
- *(.text.asan.* .text.tsan.*)
+ *(.ref.text)
/* sched.text is aling to function alignment to secure we have same
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 4/7] Add markers for text_unlikely and text_hot sections
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
` (2 preceding siblings ...)
2024-10-23 22:44 ` [PATCH v5 3/7] Change the symbols order when --ffunction-sections is enabled Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-10-23 22:44 ` [PATCH v5 5/7] AutoFDO: Enable -ffunction-sections for the AutoFDO build Rong Xu
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
Add markers like __hot_text_start, __hot_text_end, __unlikely_text_start,
and __unlikely_text_end which will be included in System.map. These markers
indicate how the compiler groups functions, providing valuable information
to developers about the layout and optimization of the code.
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
include/asm-generic/vmlinux.lds.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index fd901951549c..e02973f3b418 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -549,6 +549,16 @@
__cpuidle_text_end = .; \
__noinstr_text_end = .;
+#define TEXT_UNLIKELY \
+ __unlikely_text_start = .; \
+ *(.text.unlikely .text.unlikely.*) \
+ __unlikely_text_end = .;
+
+#define TEXT_HOT \
+ __hot_text_start = .; \
+ *(.text.hot .text.hot.*) \
+ __hot_text_end = .;
+
/*
* .text section. Map to function alignment to avoid address changes
* during second ld run in second ld pass when generating System.map
@@ -565,9 +575,9 @@
ALIGN_FUNCTION(); \
*(.text.asan.* .text.tsan.*) \
*(.text.unknown .text.unknown.*) \
- *(.text.unlikely .text.unlikely.*) \
+ TEXT_UNLIKELY \
. = ALIGN(PAGE_SIZE); \
- *(.text.hot .text.hot.*) \
+ TEXT_HOT \
*(TEXT_MAIN .text.fixup) \
NOINSTR_TEXT \
*(.ref.text)
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 5/7] AutoFDO: Enable -ffunction-sections for the AutoFDO build
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
` (3 preceding siblings ...)
2024-10-23 22:44 ` [PATCH v5 4/7] Add markers for text_unlikely and text_hot sections Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-10-23 22:44 ` [PATCH v5 6/7] AutoFDO: Enable machine function split optimization for AutoFDO Rong Xu
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
Enable -ffunction-sections by default for the AutoFDO build.
With -ffunction-sections, the compiler places each function in its own
section named .text.function_name instead of placing all functions in
the .text section. In the AutoFDO build, this allows the linker to
utilize profile information to reorganize functions for improved
utilization of iCache and iTLB.
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
include/asm-generic/vmlinux.lds.h | 11 +++++++++--
scripts/Makefile.autofdo | 2 +-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e02973f3b418..bd64fdedabd2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -95,18 +95,25 @@
* With LTO_CLANG, the linker also splits sections by default, so we need
* these macros to combine the sections during the final link.
*
+ * With AUTOFDO_CLANG, by default, the linker splits text sections and
+ * regroups functions into subsections.
+ *
* RODATA_MAIN is not used because existing code already defines .rodata.x
* sections to be brought in with rodata.
*/
-#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
+defined(CONFIG_AUTOFDO_CLANG)
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
+#else
+#define TEXT_MAIN .text
+#endif
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* .data.$__unnamed_* .data.$L*
#define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L*
#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..L* .bss..compoundliteral*
#define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
#else
-#define TEXT_MAIN .text
#define DATA_MAIN .data
#define SDATA_MAIN .sdata
#define RODATA_MAIN .rodata
diff --git a/scripts/Makefile.autofdo b/scripts/Makefile.autofdo
index ff96a63fea7c..6155d6fc4ca7 100644
--- a/scripts/Makefile.autofdo
+++ b/scripts/Makefile.autofdo
@@ -9,7 +9,7 @@ ifndef CONFIG_DEBUG_INFO
endif
ifdef CLANG_AUTOFDO_PROFILE
- CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE)
+ CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE) -ffunction-sections
endif
ifdef CONFIG_LTO_CLANG_THIN
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 6/7] AutoFDO: Enable machine function split optimization for AutoFDO
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
` (4 preceding siblings ...)
2024-10-23 22:44 ` [PATCH v5 5/7] AutoFDO: Enable -ffunction-sections for the AutoFDO build Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-10-23 22:44 ` [PATCH v5 7/7] Add Propeller configuration for kernel build Rong Xu
2024-10-25 23:03 ` [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Yabin Cui
7 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
Enable the machine function split optimization for AutoFDO in Clang.
Machine function split (MFS) is a pass in the Clang compiler that
splits a function into hot and cold parts. The linker groups all
cold blocks across functions together. This decreases hot code
fragmentation and improves iCache and iTLB utilization.
MFS requires a profile so this is enabled only for the AutoFDO builds.
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
include/asm-generic/vmlinux.lds.h | 7 ++++++-
scripts/Makefile.autofdo | 2 ++
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bd64fdedabd2..8a0bb3946cf0 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -556,6 +556,11 @@ defined(CONFIG_AUTOFDO_CLANG)
__cpuidle_text_end = .; \
__noinstr_text_end = .;
+#define TEXT_SPLIT \
+ __split_text_start = .; \
+ *(.text.split .text.split.[0-9a-zA-Z_]*) \
+ __split_text_end = .;
+
#define TEXT_UNLIKELY \
__unlikely_text_start = .; \
*(.text.unlikely .text.unlikely.*) \
@@ -582,6 +587,7 @@ defined(CONFIG_AUTOFDO_CLANG)
ALIGN_FUNCTION(); \
*(.text.asan.* .text.tsan.*) \
*(.text.unknown .text.unknown.*) \
+ TEXT_SPLIT \
TEXT_UNLIKELY \
. = ALIGN(PAGE_SIZE); \
TEXT_HOT \
@@ -589,7 +595,6 @@ defined(CONFIG_AUTOFDO_CLANG)
NOINSTR_TEXT \
*(.ref.text)
-
/* sched.text is aling to function alignment to secure we have same
* address even at second ld pass when generating System.map */
#define SCHED_TEXT \
diff --git a/scripts/Makefile.autofdo b/scripts/Makefile.autofdo
index 6155d6fc4ca7..1caf2457e585 100644
--- a/scripts/Makefile.autofdo
+++ b/scripts/Makefile.autofdo
@@ -10,6 +10,7 @@ endif
ifdef CLANG_AUTOFDO_PROFILE
CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE) -ffunction-sections
+ CFLAGS_AUTOFDO_CLANG += -fsplit-machine-functions
endif
ifdef CONFIG_LTO_CLANG_THIN
@@ -17,6 +18,7 @@ ifdef CONFIG_LTO_CLANG_THIN
KBUILD_LDFLAGS += --lto-sample-profile=$(CLANG_AUTOFDO_PROFILE)
endif
KBUILD_LDFLAGS += --mllvm=-enable-fs-discriminator=true --mllvm=-improved-fs-discriminator=true -plugin-opt=thinlto
+ KBUILD_LDFLAGS += -plugin-opt=-split-machine-functions
endif
export CFLAGS_AUTOFDO_CLANG
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 7/7] Add Propeller configuration for kernel build
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
` (5 preceding siblings ...)
2024-10-23 22:44 ` [PATCH v5 6/7] AutoFDO: Enable machine function split optimization for AutoFDO Rong Xu
@ 2024-10-23 22:44 ` Rong Xu
2024-10-25 23:03 ` [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Yabin Cui
7 siblings, 0 replies; 11+ messages in thread
From: Rong Xu @ 2024-10-23 22:44 UTC (permalink / raw)
To: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Rong Xu, Sami Tolvanen,
Thomas Gleixner, Wei Yang, workflows, Miguel Ojeda,
Maksim Panchenko, Yonghong Song, Yabin Cui, Krzysztof Pszeniczny,
Sriraman Tallam, Stephane Eranian
Cc: x86, linux-arch, linux-doc, linux-kbuild, linux-kernel, llvm
Add the build support for using Clang's Propeller optimizer. Like
AutoFDO, Propeller uses hardware sampling to gather information
about the frequency of execution of different code paths within a
binary. This information is then used to guide the compiler's
optimization decisions, resulting in a more efficient binary.
The support requires a Clang compiler LLVM 19 or later, and the
create_llvm_prof tool
(https://github.com/google/autofdo/releases/tag/v0.30.1). This
commit is limited to x86 platforms that support PMU features
like LBR on Intel machines and AMD Zen3 BRS.
Here is an example workflow for building an AutoFDO+Propeller
optimized kernel:
1) Build the kernel on the host machine, with AutoFDO and Propeller
build config
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
then
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile>
“<autofdo_profile>” is the profile collected when doing a non-Propeller
AutoFDO build. This step builds a kernel that has the same optimization
level as AutoFDO, plus a metadata section that records basic block
information. This kernel image runs as fast as an AutoFDO optimized
kernel.
2) Install the kernel on test/production machines.
3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number,
like 500009, for this purpose.
For Intel platforms:
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
-o <perf_file> -- <loadtest>
For AMD platforms:
The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
# To see if Zen3 support LBR:
$ cat proc/cpuinfo | grep " brs"
# To see if Zen4 support LBR:
$ cat proc/cpuinfo | grep amd_lbr_v2
# If the result is yes, then collect the profile using:
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c <count> -o <perf_file> -- <loadtest>
4) (Optional) Download the raw perf file to the host machine.
5) Generate Propeller profile:
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
--format=propeller --propeller_output_module_name \
--out=<propeller_profile_prefix>_cc_profile.txt \
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
“create_llvm_prof” is the profile conversion tool, and a prebuilt
binary for linux can be found on
https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build
from source).
"<propeller_profile_prefix>" can be something like
"/home/user/dir/any_string".
This command generates a pair of Propeller profiles:
"<propeller_profile_prefix>_cc_profile.txt" and
"<propeller_profile_prefix>_ld_profile.txt".
6) Rebuild the kernel using the AutoFDO and Propeller profile files.
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
and
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \
CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/propeller.rst | 162 ++++++++++++++++++++++++++
MAINTAINERS | 7 ++
Makefile | 1 +
arch/Kconfig | 19 +++
arch/x86/Kconfig | 1 +
arch/x86/kernel/vmlinux.lds.S | 4 +
include/asm-generic/vmlinux.lds.h | 6 +-
scripts/Makefile.lib | 10 ++
scripts/Makefile.propeller | 28 +++++
tools/objtool/check.c | 1 +
11 files changed, 237 insertions(+), 3 deletions(-)
create mode 100644 Documentation/dev-tools/propeller.rst
create mode 100644 scripts/Makefile.propeller
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 6945644f7008..3c0ac08b2709 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -35,6 +35,7 @@ Documentation/dev-tools/testing-overview.rst
checkuapi
gpio-sloppy-logic-analyzer
autofdo
+ propeller
.. only:: subproject and html
diff --git a/Documentation/dev-tools/propeller.rst b/Documentation/dev-tools/propeller.rst
new file mode 100644
index 000000000000..92195958e3db
--- /dev/null
+++ b/Documentation/dev-tools/propeller.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
+Using Propeller with the Linux kernel
+=====================================
+
+This enables Propeller build support for the kernel when using Clang
+compiler. Propeller is a profile-guided optimization (PGO) method used
+to optimize binary executables. Like AutoFDO, it utilizes hardware
+sampling to gather information about the frequency of execution of
+different code paths within a binary. Unlike AutoFDO, this information
+is then used right before linking phase to optimize (among others)
+block layout within and across functions.
+
+A few important notes about adopting Propeller optimization:
+
+#. Although it can be used as a standalone optimization step, it is
+ strongly recommended to apply Propeller on top of AutoFDO,
+ AutoFDO+ThinLTO or Instrument FDO. The rest of this document
+ assumes this paradigm.
+
+#. Propeller uses another round of profiling on top of
+ AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
+ "build-afdo - train-afdo - build-propeller - train-propeller -
+ build-optimized".
+
+#. Propeller requires LLVM 19 release or later for Clang/Clang++
+ and the linker(ld.lld).
+
+#. In addition to LLVM toolchain, Propeller requires a profiling
+ conversion tool: https://github.com/google/autofdo with a release
+ after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
+
+The Propeller optimization process involves the following steps:
+
+#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
+ you would normally do, but with a set of compile-time / link-time
+ flags, so that a special metadata section is created within the
+ kernel binary. The special section is only intend to be used by the
+ profiling tool, it is not part of the runtime image, nor does it
+ change kernel run time text sections.
+
+#. Profiling: The above kernel is then run with a representative
+ workload to gather execution frequency data. This data is collected
+ using hardware sampling, via perf. Propeller is most effective on
+ platforms supporting advanced PMU features like LBR on Intel
+ machines. This step is the same as profiling the kernel for AutoFDO
+ (the exact perf parameters can be different).
+
+#. Propeller profile generation: Perf output file is converted to a
+ pair of Propeller profiles via an offline tool.
+
+#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
+ binary as you would normally do, but with a compile-time /
+ link-time flag to pick up the Propeller compile time and link time
+ profiles. This build step uses 3 profiles - the AutoFDO profile,
+ the Propeller compile-time profile and the Propeller link-time
+ profile.
+
+#. Deployment: The optimized kernel binary is deployed and used
+ in production environments, providing improved performance
+ and reduced latency.
+
+Preparation
+===========
+
+Configure the kernel with::
+
+ CONFIG_AUTOFDO_CLANG=y
+ CONFIG_PROPELLER_CLANG=y
+
+Customization
+=============
+
+The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
+for Propeller builds. One can, however, enable or disable Propeller build
+for individual files and directories by adding a line similar to the
+following to the respective kernel Makefile:
+
+- For enabling a single file (e.g. foo.o)::
+
+ PROPELLER_PROFILE_foo.o := y
+
+- For enabling all files in one directory::
+
+ PROPELLER_PROFILE := y
+
+- For disabling one file::
+
+ PROPELLER_PROFILE_foo.o := n
+
+- For disabling all files in one directory::
+
+ PROPELLER__PROFILE := n
+
+
+Workflow
+========
+
+Here is an example workflow for building an AutoFDO+Propeller kernel:
+
+1) Assuming an AutoFDO profile is already collected following
+ instructions in the AutoFDO document, build the kernel on the host
+ machine, with AutoFDO and Propeller build configs ::
+
+ CONFIG_AUTOFDO_CLANG=y
+ CONFIG_PROPELLER_CLANG=y
+
+ and ::
+
+ $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
+
+2) Install the kernel on the test machine.
+
+3) Run the load tests. The '-c' option in perf specifies the sample
+ event period. We suggest using a suitable prime number, like 500009,
+ for this purpose.
+
+ - For Intel platforms::
+
+ $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+
+ - For AMD platforms::
+
+ $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+
+ Note you can repeat the above steps to collect multiple <perf_file>s.
+
+4) (Optional) Download the raw perf file(s) to the host machine.
+
+5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
+ generate Propeller profile. ::
+
+ $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
+ --format=propeller --propeller_output_module_name
+ --out=<propeller_profile_prefix>_cc_profile.txt
+ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
+
+ "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
+
+ This command generates a pair of Propeller profiles:
+ "<propeller_profile_prefix>_cc_profile.txt" and
+ "<propeller_profile_prefix>_ld_profile.txt".
+
+ If there are more than 1 perf_file collected in the previous step,
+ you can create a temp list file "<perf_file_list>" with each line
+ containing one perf file name and run::
+
+ $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
+ --format=propeller --propeller_output_module_name
+ --out=<propeller_profile_prefix>_cc_profile.txt
+ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
+
+6) Rebuild the kernel using the AutoFDO and Propeller
+ profiles. ::
+
+ CONFIG_AUTOFDO_CLANG=y
+ CONFIG_PROPELLER_CLANG=y
+
+ and ::
+
+ $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
diff --git a/MAINTAINERS b/MAINTAINERS
index 1b8db863031f..f4cc6dd6c4d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18560,6 +18560,13 @@ S: Maintained
F: include/linux/psi*
F: kernel/sched/psi.c
+PROPELLER BUILD
+M: Rong Xu <xur@google.com>
+M: Han Shen <shenhan@google.com>
+S: Supported
+F: Documentation/dev-tools/propeller.rst
+F: scripts/Makefile.propeller
+
PRINTK
M: Petr Mladek <pmladek@suse.com>
R: Steven Rostedt <rostedt@goodmis.org>
diff --git a/Makefile b/Makefile
index bbb6ec68f5dc..2d2f688c21c6 100644
--- a/Makefile
+++ b/Makefile
@@ -1019,6 +1019,7 @@ include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
+include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
include $(addprefix $(srctree)/, $(include-y))
diff --git a/arch/Kconfig b/arch/Kconfig
index 5e9604960cbb..ea7aed39196b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -831,6 +831,25 @@ config AUTOFDO_CLANG
If unsure, say N.
+config ARCH_SUPPORTS_PROPELLER_CLANG
+ bool
+
+config PROPELLER_CLANG
+ bool "Enable Clang's Propeller build"
+ depends on ARCH_SUPPORTS_PROPELLER_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 190000
+ help
+ This option enables Clang’s Propeller build. When the Propeller
+ profiles is specified in variable CLANG_PROPELLER_PROFILE_PREFIX
+ during the build process, Clang uses the profiles to optimize
+ the kernel.
+
+ If no profile is specified, Propeller options are still passed
+ to Clang to facilitate the collection of perf data for creating
+ the Propeller profiles in subsequent builds.
+
+ If unsure, say N.
+
config ARCH_SUPPORTS_CFI_CLANG
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 503a0268155a..da47164bfddc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_RT
select ARCH_SUPPORTS_AUTOFDO_CLANG
+ select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
select ARCH_USE_MEMTEST
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 6726be89b7a6..7ecc21c569be 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -442,6 +442,10 @@ SECTIONS
STABS_DEBUG
DWARF_DEBUG
+#ifdef CONFIG_PROPELLER_CLANG
+ .llvm_bb_addr_map : { *(.llvm_bb_addr_map) }
+#endif
+
ELF_DETAILS
DISCARDS
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8a0bb3946cf0..c995474e4c64 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -95,14 +95,14 @@
* With LTO_CLANG, the linker also splits sections by default, so we need
* these macros to combine the sections during the final link.
*
- * With AUTOFDO_CLANG, by default, the linker splits text sections and
- * regroups functions into subsections.
+ * With AUTOFDO_CLANG and PROPELLER_CLANG, by default, the linker splits
+ * text sections and regroups functions into subsections.
*
* RODATA_MAIN is not used because existing code already defines .rodata.x
* sections to be brought in with rodata.
*/
#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
-defined(CONFIG_AUTOFDO_CLANG)
+defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
#else
#define TEXT_MAIN .text
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 2d0942c1a027..e7859ad90224 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -201,6 +201,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_AUTOFDO_CLANG))
endif
+#
+# Enable Propeller build flags except some files or directories we don't want to
+# enable (depends on variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE).
+#
+ifdef CONFIG_PROPELLER_CLANG
+_c_flags += $(if $(patsubst n%,, \
+ $(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(PROPELLER_PROFILE))$(is-kernel-object), \
+ $(CFLAGS_PROPELLER_CLANG))
+endif
+
# $(src) for including checkin headers from generated source files
# $(obj) for including generated headers from checkin source files
ifeq ($(KBUILD_EXTMOD),)
diff --git a/scripts/Makefile.propeller b/scripts/Makefile.propeller
new file mode 100644
index 000000000000..344190717e47
--- /dev/null
+++ b/scripts/Makefile.propeller
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# Enable available and selected Clang Propeller features.
+ifdef CLANG_PROPELLER_PROFILE_PREFIX
+ CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections
+ KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering
+else
+ CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels
+endif
+
+# Propeller requires debug information to embed module names in the profiles.
+# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO,
+# as the option should already be set.
+ifndef CONFIG_DEBUG_INFO
+ ifndef CONFIG_AUTOFDO_CLANG
+ CFLAGS_PROPELLER_CLANG += -gmlt
+ endif
+endif
+
+ifdef CONFIG_LTO_CLANG_THIN
+ ifdef CLANG_PROPELLER_PROFILE_PREFIX
+ KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt
+ else
+ KBUILD_LDFLAGS += --lto-basic-block-sections=labels
+ endif
+endif
+
+export CFLAGS_PROPELLER_CLANG
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 4c5229991e1e..05a0fb4a3d1a 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -4558,6 +4558,7 @@ static int validate_ibt(struct objtool_file *file)
!strcmp(sec->name, "__mcount_loc") ||
!strcmp(sec->name, ".kcfi_traps") ||
!strcmp(sec->name, ".llvm.call-graph-profile") ||
+ !strcmp(sec->name, ".llvm_bb_addr_map") ||
strstr(sec->name, "__patchable_function_entries"))
continue;
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build
2024-10-23 22:43 [PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build Rong Xu
` (6 preceding siblings ...)
2024-10-23 22:44 ` [PATCH v5 7/7] Add Propeller configuration for kernel build Rong Xu
@ 2024-10-25 23:03 ` Yabin Cui
7 siblings, 0 replies; 11+ messages in thread
From: Yabin Cui @ 2024-10-25 23:03 UTC (permalink / raw)
To: Rong Xu
Cc: Alice Ryhl, Andrew Morton, Arnd Bergmann, Bill Wendling,
Borislav Petkov, Breno Leitao, Brian Gerst, Dave Hansen, David Li,
Han Shen, Heiko Carstens, H. Peter Anvin, Ingo Molnar, Jann Horn,
Jonathan Corbet, Josh Poimboeuf, Juergen Gross, Justin Stitt,
Kees Cook, Masahiro Yamada, Mike Rapoport (IBM),
Nathan Chancellor, Nick Desaulniers, Nicolas Schier,
Paul E. McKenney, Peter Zijlstra, Sami Tolvanen, Thomas Gleixner,
Wei Yang, workflows, Miguel Ojeda, Maksim Panchenko,
Yonghong Song, Krzysztof Pszeniczny, Sriraman Tallam,
Stephane Eranian, x86, linux-arch, linux-doc, linux-kbuild,
linux-kernel, llvm
Hi Rong,
I tested this patchset on the android-mainline kernel branch (closely tracking
the Linux mainline branch) using the latest Clang compiler with an AutoFDO
profile. It passed all presubmit tests, including boot and local tests, and
the AutoFDO profile yielded performance improvements across various benchmarks.
Tested-by: Yabin Cui <yabinc@google.com>
Thanks,
Yabin
On Wed, Oct 23, 2024 at 3:44 PM Rong Xu <xur@google.com> wrote:
>
> Hi,
>
> This patch series is to integrate AutoFDO and Propeller support into
> the Linux kernel. AutoFDO is a profile-guided optimization technique
> that leverages hardware sampling to enhance binary performance.
> Unlike Instrumentation-based FDO (iFDO), AutoFDO offers a user-friendly
> and straightforward application process. While iFDO generally yields
> superior profile quality and performance, our findings reveal that
> AutoFDO achieves remarkable effectiveness, bringing performance close
> to iFDO for benchmark applications.
>
> Propeller is a profile-guided, post-link optimizer that improves
> the performance of large-scale applications compiled with LLVM. It
> operates by relinking the binary based on an additional round of runtime
> profiles, enabling precise optimizations that are not possible at
> compile time. Similar to AutoFDO, Propeller too utilizes hardware
> sampling to collect profiles and apply post-link optimizations to improve
> the benchmark’s performance over and above AutoFDO.
>
> Our empirical data demonstrates significant performance improvements
> with AutoFDO and Propeller, up to 10% on microbenchmarks and up to 5%
> on large warehouse-scale benchmarks. This makes a strong case for their
> inclusion as supported features in the upstream kernel.
>
> Background
>
> A significant fraction of fleet processing cycles (excluding idle time)
> from data center workloads are attributable to the kernel. Ware-house
> scale workloads maximize performance by optimizing the production kernel
> using iFDO (a.k.a instrumented PGO, Profile Guided Optimization).
>
> iFDO can significantly enhance application performance but its use
> within the kernel has raised concerns. AutoFDO is a variant of FDO that
> uses the hardware’s Performance Monitoring Unit (PMU) to collect
> profiling data. While AutoFDO typically yields smaller performance
> gains than iFDO, it presents unique benefits for optimizing kernels.
>
> AutoFDO eliminates the need for instrumented kernels, allowing a single
> optimized kernel to serve both execution and profile collection. It also
> minimizes slowdown during profile collection, potentially yielding
> higher-fidelity profiling, especially for time-sensitive code, compared
> to iFDO. Additionally, AutoFDO profiles can be obtained from production
> environments via the hardware’s PMU whereas iFDO profiles require
> carefully curated load tests that are representative of real-world
> traffic.
>
> AutoFDO facilitates profile collection across diverse targets.
> Preliminary studies indicate significant variation in kernel hot spots
> within Google’s infrastructure, suggesting potential performance gains
> through target-specific kernel customization.
>
> Furthermore, other advanced compiler optimization techniques, including
> ThinLTO and Propeller can be stacked on top of AutoFDO, similar to iFDO.
> ThinLTO achieves better runtime performance through whole-program
> analysis and cross module optimizations. The main difference between
> traditional LTO and ThinLTO is that the latter is scalable in time and
> memory.
>
> This patch series adds AutoFDO and Propeller support to the kernel. The
> actual solution comes in six parts:
>
> [P 1] Add the build support for using AutoFDO in Clang
>
> Add the basic support for AutoFDO build and provide the
> instructions for using AutoFDO.
>
> [P 2] Fix objtool for bogus warnings when -ffunction-sections is enabled
>
> [P 3] Change the subsection ordering when -ffunction-sections is enabled
>
> [P 4] Add markers for text_unlikely and text_hot sections
>
> [P 5] Enable –ffunction-sections for the AutoFDO build
>
> [P 6] Enable Machine Function Split (MFS) optimization for AutoFDO
>
> [P 7] Add Propeller configuration to the kernel build
>
> Patch 1 provides basic AutoFDO build support. Patches 2 to 6 further
> enhance the performance of AutoFDO builds and are functionally dependent
> on Patch 1. Patch 7 enables support for Propeller and is dependent on
> patch 2 to patch 4.
>
> Caveats
>
> AutoFDO is compatible with both GCC and Clang, but the patches in this
> series are exclusively applicable to LLVM 17 or newer for AutoFDO and
> LLVM 19 or newer for Propeller. For profile conversion, two different
> tools could be used, llvm_profgen or create_llvm_prof. llvm_profgen
> needs to be the LLVM 19 or newer, or just the LLVM trunk. Alternatively,
> create_llvm_prof v0.30.1 or newer can be used instead of llvm-profgen.
>
> Additionally, the build is only supported on x86 platforms equipped
> with PMU capabilities, such as LBR on Intel machines. More
> specifically:
> * Intel platforms: works on every platform that supports LBR;
> we have tested on Skylake.
> * AMD platforms: tested on AMD Zen3 with the BRS feature. The kernel
> needs to be configured with “CONFIG_PERF_EVENTS_AMD_BRS=y", To
> check, use
> $ cat /proc/cpuinfo | grep “ brs”
> For the AMD Zen4, AMD LBRV2 is supported, but we suspect a bug with
> AMD LBRv2 implementation in Genoa which blocks the usage.
>
> For ARM, we plan to send patches for SPE-based Propeller when
> AutoFDO for Arm is ready.
>
> Experiments and Results
>
> Experiments were conducted to compare the performance of AutoFDO-optimized
> kernel images (version 6.9.x) against default builds.. The evaluation
> encompassed both open source microbenchmarks and real-world production
> services from Google and Meta. The selected microbenchmarks included Neper,
> a network subsystem benchmark, and UnixBench which is a comprehensive suite
> for assessing various kernel operations.
>
> For Neper, AutoFDO optimization resulted in a 6.1% increase in throughput
> and a 10.6% reduction in latency. UnixBench saw a 2.2% improvement in its
> index score under low system load and a 2.6% improvement under high system
> load.
>
> For further details on the improvements observed in Google and Meta's
> production services, please refer to the LLVM discourse post:
> https://discourse.llvm.org/t/optimizing-the-linux-kernel-with-autofdo-including-thinlto-and-propeller/79108
>
> Thanks,
>
> Rong Xu and Han Shen
>
> Change-Logs in V2:
> Rebased to commit e32cde8d2bd7 ("Merge tag 'sched_ext-for-6.12-rc1-fixes-1'
> of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext")
>
> 1. [P 0]: moved the Propeller description to the top (Peter Zijlstra)
> 2. [P 1]: (1) Makefile: fixed file order (Masahiro Yamada)
> (2) scripts/Makefile.lib: used is-kernel-object to exclude
> files (Masahiro Yamada)
> (3) scripts/Makefile.autofdo: improved the code (Masahiro Yamada)
> (4) scripts/Makefile.autofdo: handled when DEBUG_INFO disabled
> (Nick Desaulniers)
> 3. [P 2]: tools/objtool/elf.c: updated the comments (Peter Zijlstra)
> 4. [P 3]: include/asm-generic/vmlinux.lds.h:
> (1) explicit set cold text function aligned (Peter Zijlstra and
> Peter Anvin)
> (2) set hot-text page aligned
> 5. [P 6]: (1) include/asm-generic/vmlinux.lds.h: made Propeller not
> depending on AutoFDO
> (2) Makefile: fixed file order (Masahiro Yamada)
> (3) scripts/Makefile.lib: used is-kernel-object to exclude
> files (Masahiro Yamada). This removed the change in
> arch/x86/platform/efi/Makefile,
> drivers/firmware/efi/libstub/Makefile, and
> arch/x86/boot/compressed/Makefile.
> And this also addressed the comment from Arnd Bergmann
> regarding arch/x86/purgatory/Makefile
> (4) scripts/Makefile.propeller: improved the code
> (Masahiro Yamada)
>
> Change-Logs in V3:
> Rebased to commit eb952c47d154 ("Merge tag 'for-6.12-rc2-tag' of
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux")
>
> Integrated the following changes suggested by Mike Rapoport.
> 1. [P 1]: autofdo.rst: removed code-block directives and used "::"
> 2. [P 6]: propeller.rst: removed code-block directives and use "::"
>
> Change-Logs in V4:
> 1. [P 1]: autofdo.rst: fixed a typo for create_llvm_prof command.
>
> Change-Logs in V5:
> Added "Tested-by: Yonghong Song <yonghong.song@linux.dev>" to all patches.
>
> Integrated the following changes suggested by Masahiro Yamada.
> 1. [P 0]: (1) moved ARM related remark from patch 6 to here
> 2. [P 1]: (1) autofdo.rst: improved the documentation
> (2) scripts/Makefile.autofdo: improved comments and used ifdef
> instead of ifeq
> 3. [P 3]: Make the layout change unconditionally
> 4. [P 4]: Split the patch into two: this patch only added the markers, and
> the AutoFDO change went to the new [P 5]
> 5. [P 7]: (1) propeller.rst: improved the documentation
> (2) scripts/Makefile.propeller: improved comments and used ifdef
> instead of ifeq
> (3) arch/Kconfig: made Propeller build independent of AutoFDO
> build
> (4) moved ARM related remarks to the cover letter
>
> Rong Xu (7):
> Add AutoFDO support for Clang build
> objtool: Fix unreachable instruction warnings for weak functions
> Change the symbols order when --ffunction-sections is enabled
> Add markers for text_unlikely and text_hot sections
> AutoFDO: Enable -ffunction-sections for the AutoFDO build
> AutoFDO: Enable machine function split optimization for AutoFDO
> Add Propeller configuration for kernel build
>
> Documentation/dev-tools/autofdo.rst | 167 ++++++++++++++++++++++++++
> Documentation/dev-tools/index.rst | 2 +
> Documentation/dev-tools/propeller.rst | 162 +++++++++++++++++++++++++
> MAINTAINERS | 14 +++
> Makefile | 2 +
> arch/Kconfig | 39 ++++++
> arch/x86/Kconfig | 2 +
> arch/x86/kernel/vmlinux.lds.S | 4 +
> include/asm-generic/vmlinux.lds.h | 49 ++++++--
> scripts/Makefile.autofdo | 24 ++++
> scripts/Makefile.lib | 20 +++
> scripts/Makefile.propeller | 28 +++++
> tools/objtool/check.c | 2 +
> tools/objtool/elf.c | 15 ++-
> 14 files changed, 514 insertions(+), 16 deletions(-)
> create mode 100644 Documentation/dev-tools/autofdo.rst
> create mode 100644 Documentation/dev-tools/propeller.rst
> create mode 100644 scripts/Makefile.autofdo
> create mode 100644 scripts/Makefile.propeller
>
>
> base-commit: eb952c47d154ba2aac794b99c66c3c45eb4cc4ec
> --
> 2.47.0.105.g07ac214952-goog
>
^ permalink raw reply [flat|nested] 11+ messages in thread