* [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects
@ 2024-08-21 4:06 Kris Van Hees
2024-08-21 4:06 ` [PATCH v7 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Kris Van Hees @ 2024-08-21 4:06 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Steven Rostedt, Masahiro Yamada, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
In order to create the file at build time, modules.builtin.ranges, that
contains the range of addresses for all built-in modules, there needs to
be a way to identify what code is compiled into modules.
To identify what code is compiled into modules during a kernel build,
one can look for the presence of the -DKBUILD_MODFILE and -DKBUILD_MODNAME
options in the compile command lines. A simple grep in .*.cmd files for
those options is sufficient for this.
Unfortunately, these options are only passed when compiling C source files.
Various modules also include objects built from assembler source, and these
options are not passed in that case.
Adding $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and
adding $(modname_flags) to a_flags (similar to c_flags) makes it possible
to identify which objects are compiled into modules for both C and
assembler soure files.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
scripts/Makefile.lib | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index fe3668dc4954..170f462537a8 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -238,7 +238,7 @@ modkern_rustflags = \
modkern_aflags = $(if $(part-of-module), \
$(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE), \
- $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL))
+ $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL) $(modfile_flags))
c_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
-include $(srctree)/include/linux/compiler_types.h \
@@ -248,7 +248,7 @@ c_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
rust_flags = $(_rust_flags) $(modkern_rustflags) @$(objtree)/include/generated/rustc_cfg
a_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
- $(_a_flags) $(modkern_aflags)
+ $(_a_flags) $(modkern_aflags) $(modname_flags)
cpp_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
$(_cpp_flags)
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v7 2/4] kbuild: generate offset range data for builtin modules
2024-08-21 4:06 [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
@ 2024-08-21 4:06 ` Kris Van Hees
2024-08-22 17:34 ` Masahiro Yamada
2024-08-21 4:06 ` [PATCH v7 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Kris Van Hees @ 2024-08-21 4:06 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Nick Alcock, Alan Maguire, Steven Rostedt,
Masahiro Yamada, Luis Chamberlain, Masami Hiramatsu,
Nick Desaulniers, Jiri Olsa, Elena Zannoni
Create file module.builtin.ranges that can be used to find where
built-in modules are located by their addresses. This will be useful for
tracing tools to find what functions are for various built-in modules.
The offset range data for builtin modules is generated using:
- modules.builtin: associates object files with module names
- vmlinux.map: provides load order of sections and offset of first member
per section
- vmlinux.o.map: provides offset of object file content per section
- .*.cmd: build cmd file with KBUILD_MODFILE
The generated data will look like:
.text 00000000-00000000 = _text
.text 0000baf0-0000cb10 amd_uncore
.text 0009bd10-0009c8e0 iosf_mbi
...
.text 00b9f080-00ba011a intel_skl_int3472_discrete
.text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
.text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
...
.data 00000000-00000000 = _sdata
.data 0000f020-0000f680 amd_uncore
For each ELF section, it lists the offset of the first symbol. This can
be used to determine the base address of the section at runtime.
Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules. Multiple ranges
can apply to a single module, and ranges can be shared between modules.
The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
is generated for kernel modules that are built into the kernel image.
How it works:
1. The modules.builtin file is parsed to obtain a list of built-in
module names and their associated object names (the .ko file that
the module would be in if it were a loadable module, hereafter
referred to as <kmodfile>). This object name can be used to
identify objects in the kernel compile because any C or assembler
code that ends up into a built-in module will have the option
-DKBUILD_MODFILE=<kmodfile> present in its build command, and those
can be found in the .<obj>.cmd file in the kernel build tree.
If an object is part of multiple modules, they will all be listed
in the KBUILD_MODFILE option argument.
This allows us to conclusively determine whether an object in the
kernel build belong to any modules, and which.
2. The vmlinux.map is parsed next to determine the base address of each
top level section so that all addresses into the section can be
turned into offsets. This makes it possible to handle sections
getting loaded at different addresses at system boot.
We also determine an 'anchor' symbol at the beginning of each
section to make it possible to calculate the true base address of
a section at runtime (i.e. symbol address - symbol offset).
We collect start addresses of sections that are included in the top
level section. This is used when vmlinux is linked using vmlinux.o,
because in that case, we need to look at the vmlinux.o linker map to
know what object a symbol is found in.
And finally, we process each symbol that is listed in vmlinux.map
(or vmlinux.o.map) based on the following structure:
vmlinux linked from vmlinux.a:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
vmlinux linked from vmlinux.o:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
vmlinux.o -- need to use vmlinux.o.map
<symbol> -- ignored
...
vmlinux.o.map:
<section>
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
3. As sections, objects, and symbols are processed, offset ranges are
constructed in a striaght-forward way:
- If the symbol belongs to one or more built-in modules:
- If we were working on the same module(s), extend the range
to include this object
- If we were working on another module(s), close that range,
and start the new one
- If the symbol does not belong to any built-in modules:
- If we were working on a module(s) range, close that range
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v6:
- Applied Masahiro Yamada's suggestions (Kconfig, makefile, script).
Changes since v5:
- Removed unnecessary compatibility info from option description.
Changes since v4:
- Improved commit description to explain the why and how.
- Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
- Improved comments in generate_builtin_ranges.awk
- Improved logic in generate_builtin_ranges.awk to handle incorrect
object size information in linker maps
Changes since v3:
- Consolidated patches 2 through 5 into a single patch
- Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
- Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
- Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
- Support LLVM (lld) compiles in generate_builtin_ranges.awk
- Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
Changes since v2:
- Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
- 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
- Switched from using modules.builtin.objs to parsing .*.cmd files
- Parse data from .*.cmd in generate_builtin_ranges.awk
- Use $(real-prereqs) rather than $(filter-out ...)
---
Documentation/process/changes.rst | 7 +
lib/Kconfig.debug | 16 +
scripts/Makefile.vmlinux | 18 +
scripts/Makefile.vmlinux_o | 3 +
scripts/generate_builtin_ranges.awk | 506 ++++++++++++++++++++++++++++
5 files changed, 550 insertions(+)
create mode 100755 scripts/generate_builtin_ranges.awk
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 3fc63f27c226..00f1ed7c59c3 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -64,6 +64,7 @@ GNU tar 1.28 tar --version
gtags (optional) 6.6.5 gtags --version
mkimage (optional) 2017.01 mkimage --version
Python (optional) 3.5.x python3 --version
+GNU AWK (optional) 5.1.0 gawk --version
====================== =============== ========================================
.. [#f1] Sphinx is needed only to build the Kernel documentation
@@ -192,6 +193,12 @@ platforms. The tool is available via the ``u-boot-tools`` package or can be
built from the U-Boot source code. See the instructions at
https://docs.u-boot.org/en/latest/build/tools.html#building-tools-for-linux
+GNU AWK
+-------
+
+GNU AWK is needed if you want kernel builds to generate address range data for
+builtin modules (CONFIG_BUILTIN_MODULE_RANGES).
+
System utilities
****************
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a30c03a66172..f087dc3da321 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -571,6 +571,22 @@ config VMLINUX_MAP
pieces of code get eliminated with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
+config BUILTIN_MODULE_RANGES
+ bool "Generate address range information for builtin modules"
+ depends on !LTO_CLANG_FULL
+ depends on !LTO_CLANG_THIN
+ depends on VMLINUX_MAP
+ help
+ When modules are built into the kernel, there will be no module name
+ associated with its symbols in /proc/kallsyms. Tracers may want to
+ identify symbols by module name and symbol name regardless of whether
+ the module is configured as loadable or not.
+
+ This option generates modules.builtin.ranges in the build tree with
+ offset ranges (per ELF section) for the module(s) they belong to.
+ It also records an anchor symbol to determine the load address of the
+ section.
+
config DEBUG_FORCE_WEAK_PER_CPU
bool "Force weak per-cpu definitions"
depends on DEBUG_KERNEL
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..7e8b703799c8 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -33,6 +33,24 @@ targets += vmlinux
vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
+# module.builtin.ranges
+# ---------------------------------------------------------------------------
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+ cmd_modules_builtin_ranges = $(real-prereqs) > $@
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: $(srctree)/scripts/generate_builtin_ranges.awk \
+ modules.builtin vmlinux.map vmlinux.o.map FORCE
+ $(call if_changed,modules_builtin_ranges)
+
+vmlinux.map: vmlinux
+ @:
+
+endif
+
# Add FORCE to the prequisites of a target to force it to be always rebuilt.
# ---------------------------------------------------------------------------
diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
index 6de297916ce6..252505505e0e 100644
--- a/scripts/Makefile.vmlinux_o
+++ b/scripts/Makefile.vmlinux_o
@@ -45,9 +45,12 @@ objtool-args = $(vmlinux-objtool-args-y) --link
# Link of vmlinux.o used for section mismatch analysis
# ---------------------------------------------------------------------------
+vmlinux-o-ld-args-$(CONFIG_BUILTIN_MODULE_RANGES) += -Map=$@.map
+
quiet_cmd_ld_vmlinux.o = LD $@
cmd_ld_vmlinux.o = \
$(LD) ${KBUILD_LDFLAGS} -r -o $@ \
+ $(vmlinux-o-ld-args-y) \
$(addprefix -T , $(initcalls-lds)) \
--whole-archive vmlinux.a --no-whole-archive \
--start-group $(KBUILD_VMLINUX_LIBS) --end-group \
diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
new file mode 100755
index 000000000000..865cb7ac4970
--- /dev/null
+++ b/scripts/generate_builtin_ranges.awk
@@ -0,0 +1,506 @@
+#!/usr/bin/gawk -f
+# SPDX-License-Identifier: GPL-2.0
+# generate_builtin_ranges.awk: Generate address range data for builtin modules
+# Written by Kris Van Hees <kris.van.hees@oracle.com>
+#
+# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
+# vmlinux.o.map > modules.builtin.ranges
+#
+
+# Return the module name(s) (if any) associated with the given object.
+#
+# If we have seen this object before, return information from the cache.
+# Otherwise, retrieve it from the corresponding .cmd file.
+#
+function get_module_info(fn, mod, obj, s) {
+ if (fn in omod)
+ return omod[fn];
+
+ if (match(fn, /\/[^/]+$/) == 0)
+ return "";
+
+ obj = fn;
+ mod = "";
+ fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
+ if (getline s <fn == 1) {
+ if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
+ mod = substr(s, RSTART + 16, RLENGTH - 16);
+ gsub(/['"]/, "", mod);
+ }
+ }
+ close(fn);
+
+ # A single module (common case) also reflects objects that are not part
+ # of a module. Some of those objects have names that are also a module
+ # name (e.g. core). We check the associated module file name, and if
+ # they do not match, the object is not part of a module.
+ if (mod !~ / /) {
+ if (!(mod in mods))
+ mod = "";
+ }
+
+ gsub(/([^/ ]*\/)+/, "", mod);
+ gsub(/-/, "_", mod);
+
+ # At this point, mod is a single (valid) module name, or a list of
+ # module names (that do not need validation).
+ omod[obj] = mod;
+ close(fn);
+
+ return mod;
+}
+
+# Update the ranges entry for the given module 'mod' in section 'osect'.
+#
+# We use a modified absolute start address (soff + base) as index because we
+# may need to insert an anchor record later that must be at the start of the
+# section data, and the first module may very well start at the same address.
+# So, we use (addr << 1) + 1 to allow a possible anchor record to be placed at
+# (addr << 1). This is safe because the index is only used to sort the entries
+# before writing them out.
+#
+function update_entry(osect, mod, soff, eoff, sect, idx) {
+ sect = sect_in[osect];
+ idx = (soff + sect_base[osect]) * 2 + 1;
+ entries[idx] = sprintf("%s %08x-%08x %s", sect, soff, eoff, mod);
+ count[sect]++;
+}
+
+# (1) Build a lookup map of built-in module names.
+#
+# The first file argument is used as input (modules.builtin).
+#
+# Lines will be like:
+# kernel/crypto/lzo-rle.ko
+# and we record the object name "crypto/lzo-rle".
+#
+ARGIND == 1 {
+ sub(/kernel\//, ""); # strip off "kernel/" prefix
+ sub(/\.ko$/, ""); # strip off .ko suffix
+
+ mods[$1] = 1;
+ next;
+}
+
+# (2) Collect address information for each section.
+#
+# The second file argument is used as input (vmlinux.map).
+#
+# We collect the base address of the section in order to convert all addresses
+# in the section into offset values.
+#
+# We collect the address of the anchor (or first symbol in the section if there
+# is no explicit anchor) to allow users of the range data to calculate address
+# ranges based on the actual load address of the section in the running kernel.
+#
+# We collect the start address of any sub-section (section included in the top
+# level section being processed). This is needed when the final linking was
+# done using vmlinux.a because then the list of objects contained in each
+# section is to be obtained from vmlinux.o.map. The offset of the sub-section
+# is recorded here, to be used as an addend when processing vmlinux.o.map
+# later.
+#
+
+# Both GNU ld and LLVM lld linker map format are supported by converting LLVM
+# lld linker map records into equivalent GNU ld linker map records.
+#
+# The first record of the vmlinux.map file provides enough information to know
+# which format we are dealing with.
+#
+ARGIND == 2 && FNR == 1 && NF == 7 && $1 == "VMA" && $7 == "Symbol" {
+ map_is_lld = 1;
+ if (dbg)
+ printf "NOTE: %s uses LLVM lld linker map format\n", FILENAME >"/dev/stderr";
+ next;
+}
+
+# (LLD) Convert a section record fronm lld format to ld format.
+#
+# lld: ffffffff82c00000 2c00000 2493c0 8192 .data
+# ->
+# ld: .data 0xffffffff82c00000 0x2493c0 load address 0x0000000002c00000
+#
+ARGIND == 2 && map_is_lld && NF == 5 && /[0-9] [^ ]+$/ {
+ $0 = $5 " 0x"$1 " 0x"$3 " load address 0x"$2;
+}
+
+# (LLD) Convert an anchor record from lld format to ld format.
+#
+# lld: ffffffff81000000 1000000 0 1 _text = .
+# ->
+# ld: 0xffffffff81000000 _text = .
+#
+ARGIND == 2 && map_is_lld && !anchor && NF == 7 && raw_addr == "0x"$1 && $6 == "=" && $7 == "." {
+ $0 = " 0x"$1 " " $5 " = .";
+}
+
+# (LLD) Convert an object record from lld format to ld format.
+#
+# lld: 11480 11480 1f07 16 vmlinux.a(arch/x86/events/amd/uncore.o):(.text)
+# ->
+# ld: .text 0x0000000000011480 0x1f07 arch/x86/events/amd/uncore.o
+#
+ARGIND == 2 && map_is_lld && NF == 5 && $5 ~ /:\(/ {
+ gsub(/\)/, "");
+ sub(/ vmlinux\.a\(/, " ");
+ sub(/:\(/, " ");
+ $0 = " "$6 " 0x"$1 " 0x"$3 " " $5;
+}
+
+# (LLD) Convert a symbol record from lld format to ld format.
+#
+# We only care about these while processing a section for which no anchor has
+# been determined yet.
+#
+# lld: ffffffff82a859a4 2a859a4 0 1 btf_ksym_iter_id
+# ->
+# ld: 0xffffffff82a859a4 btf_ksym_iter_id
+#
+ARGIND == 2 && map_is_lld && sect && !anchor && NF == 5 && $5 ~ /^[_A-Za-z][_A-Za-z0-9]*$/ {
+ $0 = " 0x"$1 " " $5;
+}
+
+# (LLD) We do not need any other ldd linker map records.
+#
+ARGIND == 2 && map_is_lld && /^[0-9a-f]{16} / {
+ next;
+}
+
+# (LD) Section records with just the section name at the start of the line
+# need to have the next line pulled in to determine whether it is a
+# loadable section. If it is, the next line will contains a hex value
+# as first and second items.
+#
+ARGIND == 2 && !map_is_lld && NF == 1 && /^[^ ]/ {
+ s = $0;
+ getline;
+ if ($1 !~ /^0x/ || $2 !~ /^0x/)
+ next;
+
+ $0 = s " " $0;
+}
+
+# (LD) Object records with just the section name denote records with a long
+# section name for which the remainder of the record can be found on the
+# next line.
+#
+# (This is also needed for vmlinux.o.map, when used.)
+#
+ARGIND >= 2 && !map_is_lld && NF == 1 && /^ [^ \*]/ {
+ s = $0;
+ getline;
+ $0 = s " " $0;
+}
+
+# Beginning a new section - done with the previous one (if any).
+#
+ARGIND == 2 && /^[^ ]/ {
+ sect = 0;
+}
+
+# Process a loadable section (we only care about .-sections).
+#
+# Record the section name and its base address.
+# We also record the raw (non-stripped) address of the section because it can
+# be used to identify an anchor record.
+#
+# Note:
+# Since some AWK implementations cannot handle large integers, we strip off the
+# first 4 hex digits from the address. This is safe because the kernel space
+# is not large enough for addresses to extend into those digits. The portion
+# to strip off is stored in addr_prefix as a regexp, so further clauses can
+# perform a simple substitution to do the address stripping.
+#
+ARGIND == 2 && /^\./ {
+ # Explicitly ignore a few sections that are not relevant here.
+ if ($1 ~ /^\.orc_/ || $1 ~ /_sites$/ || $1 ~ /\.percpu/)
+ next;
+
+ # Sections with a 0-address can be ignored as well.
+ if ($2 ~ /^0x0+$/)
+ next;
+
+ raw_addr = $2;
+ addr_prefix = "^" substr($2, 1, 6);
+ base = $2;
+ sub(addr_prefix, "0x", base);
+ base = strtonum(base);
+ sect = $1;
+ anchor = 0;
+ sect_base[sect] = base;
+ sect_size[sect] = strtonum($3);
+
+ if (dbg)
+ printf "[%s] BASE %016x\n", sect, base >"/dev/stderr";
+
+ next;
+}
+
+# If we are not in a section we care about, we ignore the record.
+#
+ARGIND == 2 && !sect {
+ next;
+}
+
+# Record the first anchor symbol for the current section.
+#
+# An anchor record for the section bears the same raw address as the section
+# record.
+#
+ARGIND == 2 && !anchor && NF == 4 && raw_addr == $1 && $3 == "=" && $4 == "." {
+ anchor = sprintf("%s %08x-%08x = %s", sect, 0, 0, $2);
+ sect_anchor[sect] = anchor;
+
+ if (dbg)
+ printf "[%s] ANCHOR %016x = %s (.)\n", sect, 0, $2 >"/dev/stderr";
+
+ next;
+}
+
+# If no anchor record was found for the current section, use the first symbol
+# in the section as anchor.
+#
+ARGIND == 2 && !anchor && NF == 2 && $1 ~ /^0x/ && $2 !~ /^0x/ {
+ addr = $1;
+ sub(addr_prefix, "0x", addr);
+ addr = strtonum(addr) - base;
+ anchor = sprintf("%s %08x-%08x = %s", sect, addr, addr, $2);
+ sect_anchor[sect] = anchor;
+
+ if (dbg)
+ printf "[%s] ANCHOR %016x = %s\n", sect, addr, $2 >"/dev/stderr";
+
+ next;
+}
+
+# The first occurence of a section name in an object record establishes the
+# addend (often 0) for that section. This information is needed to handle
+# sections that get combined in the final linking of vmlinux (e.g. .head.text
+# getting included at the start of .text).
+#
+# If the section does not have a base yet, use the base of the encapsulating
+# section.
+#
+ARGIND == 2 && sect && NF == 4 && /^ [^ \*]/ && !($1 in sect_addend) {
+ if (!($1 in sect_base)) {
+ sect_base[$1] = base;
+
+ if (dbg)
+ printf "[%s] BASE %016x\n", $1, base >"/dev/stderr";
+ }
+
+ addr = $2;
+ sub(addr_prefix, "0x", addr);
+ addr = strtonum(addr);
+ sect_addend[$1] = addr - sect_base[$1];
+ sect_in[$1] = sect;
+
+ if (dbg)
+ printf "[%s] ADDEND %016x - %016x = %016x\n", $1, addr, base, sect_addend[$1] >"/dev/stderr";
+
+ # If the object is vmlinux.o then we will need vmlinux.o.map to get the
+ # actual offsets of objects.
+ if ($4 == "vmlinux.o")
+ need_o_map = 1;
+}
+
+# (3) Collect offset ranges (relative to the section base address) for built-in
+# modules.
+#
+# If the final link was done using the actual objects, vmlinux.map contains all
+# the information we need (see section (3a)).
+# If linking was done using vmlinux.a as intermediary, we will need to process
+# vmlinux.o.map (see section (3b)).
+
+# (3a) Determine offset range info using vmlinux.map.
+#
+# Since we are already processing vmlinux.map, the top level section that is
+# being processed is already known. If we do not have a base address for it,
+# we do not need to process records for it.
+#
+# Given the object name, we determine the module(s) (if any) that the current
+# object is associated with.
+#
+# If we were already processing objects for a (list of) module(s):
+# - If the current object belongs to the same module(s), update the range data
+# to include the current object.
+# - Otherwise, ensure that the end offset of the range is valid.
+#
+# If the current object does not belong to a built-in module, ignore it.
+#
+# If it does, we add a new built-in module offset range record.
+#
+ARGIND == 2 && !need_o_map && /^ [^ ]/ && NF == 4 && $3 != "0x0" {
+ if (!(sect in sect_base))
+ next;
+
+ # Turn the address into an offset from the section base.
+ soff = $2;
+ sub(addr_prefix, "0x", soff);
+ soff = strtonum(soff) - sect_base[sect];
+ eoff = soff + strtonum($3);
+
+ # Determine which (if any) built-in modules the object belongs to.
+ mod = get_module_info($4);
+
+ # If we are processing a built-in module:
+ # - If the current object is within the same module, we update its
+ # entry by extending the range and move on
+ # - Otherwise:
+ # + If we are still processing within the same main section, we
+ # validate the end offset against the start offset of the
+ # current object (e.g. .rodata.str1.[18] objects are often
+ # listed with an incorrect size in the linker map)
+ # + Otherwise, we validate the end offset against the section
+ # size
+ if (mod_name) {
+ if (mod == mod_name) {
+ mod_eoff = eoff;
+ update_entry(mod_sect, mod_name, mod_soff, eoff);
+
+ next;
+ } else if (sect == sect_in[mod_sect]) {
+ if (mod_eoff > soff)
+ update_entry(mod_sect, mod_name, mod_soff, soff);
+ } else {
+ v = sect_size[sect_in[mod_sect]];
+ if (mod_eoff > v)
+ update_entry(mod_sect, mod_name, mod_soff, v);
+ }
+ }
+
+ mod_name = mod;
+
+ # If we encountered an object that is not part of a built-in module, we
+ # do not need to record any data.
+ if (!mod)
+ next;
+
+ # At this point, we encountered the start of a new built-in module.
+ mod_name = mod;
+ mod_soff = soff;
+ mod_eoff = eoff;
+ mod_sect = $1;
+ update_entry($1, mod, soff, mod_eoff);
+
+ next;
+}
+
+# If we do not need to parse the vmlinux.o.map file, we are done.
+#
+ARGIND == 3 && !need_o_map {
+ if (dbg)
+ printf "Note: %s is not needed.\n", FILENAME >"/dev/stderr";
+ exit;
+}
+
+# (3) Collect offset ranges (relative to the section base address) for built-in
+# modules.
+#
+
+# (LLD) Convert an object record from lld format to ld format.
+#
+ARGIND == 3 && map_is_lld && NF == 5 && $5 ~ /:\(/ {
+ gsub(/\)/, "");
+ sub(/:\(/, " ");
+
+ sect = $6;
+ if (!(sect in sect_addend))
+ next;
+
+ sub(/ vmlinux\.a\(/, " ");
+ $0 = " "sect " 0x"$1 " 0x"$3 " " $5;
+}
+
+# (3b) Determine offset range info using vmlinux.o.map.
+#
+# If we do not know an addend for the object's section, we are interested in
+# anything within that section.
+#
+# Determine the top-level section that the object's section was included in
+# during the final link. This is the section name offset range data will be
+# associated with for this object.
+#
+# The remainder of the processing of the current object record follows the
+# procedure outlined in (3a).
+#
+ARGIND == 3 && /^ [^ ]/ && NF == 4 && $3 != "0x0" {
+ osect = $1;
+ if (!(osect in sect_addend))
+ next;
+
+ # We need to work with the main section.
+ sect = sect_in[osect];
+
+ # Turn the address into an offset from the section base.
+ soff = $2;
+ sub(addr_prefix, "0x", soff);
+ soff = strtonum(soff) + sect_addend[osect];
+ eoff = soff + strtonum($3);
+
+ # Determine which (if any) built-in modules the object belongs to.
+ mod = get_module_info($4);
+
+ # If we are processing a built-in module:
+ # - If the current object is within the same module, we update its
+ # entry by extending the range and move on
+ # - Otherwise:
+ # + If we are still processing within the same main section, we
+ # validate the end offset against the start offset of the
+ # current object (e.g. .rodata.str1.[18] objects are often
+ # listed with an incorrect size in the linker map)
+ # + Otherwise, we validate the end offset against the section
+ # size
+ if (mod_name) {
+ if (mod == mod_name) {
+ mod_eoff = eoff;
+ update_entry(mod_sect, mod_name, mod_soff, eoff);
+
+ next;
+ } else if (sect == sect_in[mod_sect]) {
+ if (mod_eoff > soff)
+ update_entry(mod_sect, mod_name, mod_soff, soff);
+ } else {
+ v = sect_size[sect_in[mod_sect]];
+ if (mod_eoff > v)
+ update_entry(mod_sect, mod_name, mod_soff, v);
+ }
+ }
+
+ mod_name = mod;
+
+ # If we encountered an object that is not part of a built-in module, we
+ # do not need to record any data.
+ if (!mod)
+ next;
+
+ # At this point, we encountered the start of a new built-in module.
+ mod_name = mod;
+ mod_soff = soff;
+ mod_eoff = eoff;
+ mod_sect = osect;
+ update_entry(osect, mod, soff, mod_eoff);
+
+ next;
+}
+
+# (4) Generate the output.
+#
+# Anchor records are added for each section that contains offset range data
+# records. They are added at an adjusted section base address (base << 1) to
+# ensure they come first in the second records (see update_entry() above for
+# more informtion).
+#
+# All entries are sorted by (adjusted) address to ensure that the output can be
+# parsed in strict ascending address order.
+#
+END {
+ for (sect in count) {
+ if (sect in sect_anchor)
+ entries[sect_base[sect] * 2] = sect_anchor[sect];
+ }
+
+ n = asorti(entries, indices);
+ for (i = 1; i <= n; i++)
+ print entries[indices[i]];
+}
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v7 3/4] scripts: add verifier script for builtin module range data
2024-08-21 4:06 [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
2024-08-21 4:06 ` [PATCH v7 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
@ 2024-08-21 4:06 ` Kris Van Hees
2024-08-22 17:35 ` Masahiro Yamada
2024-08-21 4:07 ` [PATCH v7 4/4] module: add install target for modules.builtin.ranges Kris Van Hees
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Kris Van Hees @ 2024-08-21 4:06 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Nick Alcock, Alan Maguire, Masahiro Yamada,
Steven Rostedt, Luis Chamberlain, Masami Hiramatsu,
Nick Desaulniers, Jiri Olsa, Elena Zannoni
The modules.builtin.ranges offset range data for builtin modules is
generated at compile time based on the list of built-in modules and
the vmlinux.map and vmlinux.o.map linker maps. This data can be used
to determine whether a symbol at a particular address belongs to
module code that was configured to be compiled into the kernel proper
as a built-in module (rather than as a standalone module).
This patch adds a script that uses the generated modules.builtin.ranges
data to annotate the symbols in the System.map with module names if
their address falls within a range that belongs to one or more built-in
modules.
It then processes the vmlinux.map (and if needed, vmlinux.o.map) to
verify the annotation:
- For each top-level section:
- For each object in the section:
- Determine whether the object is part of a built-in module
(using modules.builtin and the .*.cmd file used to compile
the object as suggested in [0])
- For each symbol in that object, verify that the built-in
module association (or lack thereof) matches the annotation
given to the symbol.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
Changes since v6:
- Applied Masahiro Yamada's suggestions to the AWK script.
Changes since v5:
- Added optional 6th argument to specify kernel build directory.
- Report error and exit if .*.o.cmd files cannot be read.
Changes since v4:
- New patch in the series
---
scripts/verify_builtin_ranges.awk | 356 ++++++++++++++++++++++++++++++
1 file changed, 356 insertions(+)
create mode 100755 scripts/verify_builtin_ranges.awk
diff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk
new file mode 100755
index 000000000000..93f66e9a8802
--- /dev/null
+++ b/scripts/verify_builtin_ranges.awk
@@ -0,0 +1,356 @@
+#!/usr/bin/gawk -f
+# SPDX-License-Identifier: GPL-2.0
+# verify_builtin_ranges.awk: Verify address range data for builtin modules
+# Written by Kris Van Hees <kris.van.hees@oracle.com>
+#
+# Usage: verify_builtin_ranges.awk modules.builtin.ranges System.map \
+# modules.builtin vmlinux.map vmlinux.o.map \
+# [ <build-dir> ]
+#
+
+# Return the module name(s) (if any) associated with the given object.
+#
+# If we have seen this object before, return information from the cache.
+# Otherwise, retrieve it from the corresponding .cmd file.
+#
+function get_module_info(fn, mod, obj, s) {
+ if (fn in omod)
+ return omod[fn];
+
+ if (match(fn, /\/[^/]+$/) == 0)
+ return "";
+
+ obj = fn;
+ mod = "";
+ fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
+ if (getline s <fn == 1) {
+ if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
+ mod = substr(s, RSTART + 16, RLENGTH - 16);
+ gsub(/['"]/, "", mod);
+ }
+ } else {
+ print "ERROR: Failed to read: " fn "\n\n" \
+ " Invalid kernel build directory (" kdir ")\n" \
+ " or its content does not match " ARGV[1] >"/dev/stderr";
+ close(fn);
+ total = 0;
+ exit(1);
+ }
+ close(fn);
+
+ # A single module (common case) also reflects objects that are not part
+ # of a module. Some of those objects have names that are also a module
+ # name (e.g. core). We check the associated module file name, and if
+ # they do not match, the object is not part of a module.
+ if (mod !~ / /) {
+ if (!(mod in mods))
+ mod = "";
+ }
+
+ gsub(/([^/ ]*\/)+/, "", mod);
+ gsub(/-/, "_", mod);
+
+ # At this point, mod is a single (valid) module name, or a list of
+ # module names (that do not need validation).
+ omod[obj] = mod;
+ close(fn);
+
+ return mod;
+}
+
+# Return a representative integer value for a given hexadecimal address.
+#
+# Since all kernel addresses fall within the same memory region, we can safely
+# strip off the first 6 hex digits before performing the hex-to-dec conversion,
+# thereby avoiding integer overflows.
+#
+function addr2val(val) {
+ sub(/^0x/, "", val);
+ if (length(val) == 16)
+ val = substr(val, 5);
+ return strtonum("0x" val);
+}
+
+# Determine the kernel build directory to use (default is .).
+#
+BEGIN {
+ if (ARGC > 6) {
+ kdir = ARGV[ARGC - 1];
+ ARGV[ARGC - 1] = "";
+ } else
+ kdir = ".";
+}
+
+# (1) Load the built-in module address range data.
+#
+ARGIND == 1 {
+ ranges[FNR] = $0;
+ rcnt++;
+ next;
+}
+
+# (2) Annotate System.map symbols with module names.
+#
+ARGIND == 2 {
+ addr = addr2val($1);
+ name = $3;
+
+ while (addr >= mod_eaddr) {
+ if (sect_symb) {
+ if (sect_symb != name)
+ next;
+
+ sect_base = addr - sect_off;
+ if (dbg)
+ printf "[%s] BASE (%s) %016x - %016x = %016x\n", sect_name, sect_symb, addr, sect_off, sect_base >"/dev/stderr";
+ sect_symb = 0;
+ }
+
+ if (++ridx > rcnt)
+ break;
+
+ $0 = ranges[ridx];
+ sub(/-/, " ");
+ if ($4 != "=") {
+ sub(/-/, " ");
+ mod_saddr = strtonum("0x" $2) + sect_base;
+ mod_eaddr = strtonum("0x" $3) + sect_base;
+ $1 = $2 = $3 = "";
+ sub(/^ +/, "");
+ mod_name = $0;
+
+ if (dbg)
+ printf "[%s] %s from %016x to %016x\n", sect_name, mod_name, mod_saddr, mod_eaddr >"/dev/stderr";
+ } else {
+ sect_name = $1;
+ sect_off = strtonum("0x" $2);
+ sect_symb = $5;
+ }
+ }
+
+ idx = addr"-"name;
+ if (addr >= mod_saddr && addr < mod_eaddr)
+ sym2mod[idx] = mod_name;
+
+ next;
+}
+
+# Once we are done annotating the System.map, we no longer need the ranges data.
+#
+FNR == 1 && ARGIND == 3 {
+ delete ranges;
+}
+
+# (3) Build a lookup map of built-in module names.
+#
+# Lines from modules.builtin will be like:
+# kernel/crypto/lzo-rle.ko
+# and we record the object name "crypto/lzo-rle".
+#
+ARGIND == 3 {
+ sub(/kernel\//, ""); # strip off "kernel/" prefix
+ sub(/\.ko$/, ""); # strip off .ko suffix
+
+ mods[$1] = 1;
+ next;
+}
+
+# (4) Get a list of symbols (per object).
+#
+# Symbols by object are read from vmlinux.map, with fallback to vmlinux.o.map
+# if vmlinux is found to have inked in vmlinux.o.
+#
+
+# If we were able to get the data we need from vmlinux.map, there is no need to
+# process vmlinux.o.map.
+#
+FNR == 1 && ARGIND == 5 && total > 0 {
+ if (dbg)
+ printf "Note: %s is not needed.\n", FILENAME >"/dev/stderr";
+ exit;
+}
+
+# First determine whether we are dealing with a GNU ld or LLVM lld linker map.
+#
+ARGIND >= 4 && FNR == 1 && NF == 7 && $1 == "VMA" && $7 == "Symbol" {
+ map_is_lld = 1;
+ next;
+}
+
+# (LLD) Convert a section record fronm lld format to ld format.
+#
+ARGIND >= 4 && map_is_lld && NF == 5 && /[0-9] [^ ]/ {
+ $0 = $5 " 0x"$1 " 0x"$3 " load address 0x"$2;
+}
+
+# (LLD) Convert an object record from lld format to ld format.
+#
+ARGIND >= 4 && map_is_lld && NF == 5 && $5 ~ /:\(\./ {
+ gsub(/\)/, "");
+ sub(/:\(/, " ");
+ sub(/ vmlinux\.a\(/, " ");
+ $0 = " "$6 " 0x"$1 " 0x"$3 " " $5;
+}
+
+# (LLD) Convert a symbol record from lld format to ld format.
+#
+ARGIND >= 4 && map_is_lld && NF == 5 && $5 ~ /^[A-Za-z_][A-Za-z0-9_]*$/ {
+ $0 = " 0x" $1 " " $5;
+}
+
+# (LLD) We do not need any other ldd linker map records.
+#
+ARGIND >= 4 && map_is_lld && /^[0-9a-f]{16} / {
+ next;
+}
+
+# Handle section records with long section names (spilling onto a 2nd line).
+#
+ARGIND >= 4 && !map_is_lld && NF == 1 && /^[^ ]/ {
+ s = $0;
+ getline;
+ $0 = s " " $0;
+}
+
+# Next section - previous one is done.
+#
+ARGIND >= 4 && /^[^ ]/ {
+ sect = 0;
+}
+
+# Get the (top level) section name.
+#
+ARGIND >= 4 && /^[^ ]/ && $2 ~ /^0x/ && $3 ~ /^0x/ {
+ # Empty section or per-CPU section - ignore.
+ if (NF < 3 || $1 ~ /\.percpu/) {
+ sect = 0;
+ next;
+ }
+
+ sect = $1;
+
+ next;
+}
+
+# If we are not currently in a section we care about, ignore records.
+#
+!sect {
+ next;
+}
+
+# Handle object records with long section names (spilling onto a 2nd line).
+#
+ARGIND >= 4 && /^ [^ \*]/ && NF == 1 {
+ # If the section name is long, the remainder of the entry is found on
+ # the next line.
+ s = $0;
+ getline;
+ $0 = s " " $0;
+}
+
+# If the object is vmlinux.o, we need to consult vmlinux.o.map for per-object
+# symbol information
+#
+ARGIND == 4 && /^ [^ ]/ && NF == 4 {
+ idx = sect":"$1;
+ if (!(idx in sect_addend)) {
+ sect_addend[idx] = addr2val($2);
+ if (dbg)
+ printf "ADDEND %s = %016x\n", idx, sect_addend[idx] >"/dev/stderr";
+ }
+ if ($4 == "vmlinux.o") {
+ need_o_map = 1;
+ next;
+ }
+}
+
+# If data from vmlinux.o.map is needed, we only process section and object
+# records from vmlinux.map to determine which section we need to pay attention
+# to in vmlinux.o.map. So skip everything else from vmlinux.map.
+#
+ARGIND == 4 && need_o_map {
+ next;
+}
+
+# Get module information for the current object.
+#
+ARGIND >= 4 && /^ [^ ]/ && NF == 4 {
+ msect = $1;
+ mod_name = get_module_info($4);
+ mod_eaddr = addr2val($2) + addr2val($3);
+
+ next;
+}
+
+# Process a symbol record.
+#
+# Evaluate the module information obtained from vmlinux.map (or vmlinux.o.map)
+# as follows:
+# - For all symbols in a given object:
+# - If the symbol is annotated with the same module name(s) that the object
+# belongs to, count it as a match.
+# - Otherwise:
+# - If the symbol is known to have duplicates of which at least one is
+# in a built-in module, disregard it.
+# - If the symbol us not annotated with any module name(s) AND the
+# object belongs to built-in modules, count it as missing.
+# - Otherwise, count it as a mismatch.
+#
+ARGIND >= 4 && /^ / && NF == 2 && $1 ~ /^0x/ {
+ idx = sect":"msect;
+ if (!(idx in sect_addend))
+ next;
+
+ addr = addr2val($1);
+
+ # Handle the rare but annoying case where a 0-size symbol is placed at
+ # the byte *after* the module range. Based on vmlinux.map it will be
+ # considered part of the current object, but it falls just beyond the
+ # module address range. Unfortunately, its address could be at the
+ # start of another built-in module, so the only safe thing to do is to
+ # ignore it.
+ if (mod_name && addr == mod_eaddr)
+ next;
+
+ # If we are processing vmlinux.o.map, we need to apply the base address
+ # of the section to the relative address on the record.
+ #
+ if (ARGIND == 5)
+ addr += sect_addend[idx];
+
+ idx = addr"-"$2;
+ mod = "";
+ if (idx in sym2mod) {
+ mod = sym2mod[idx];
+ if (sym2mod[idx] == mod_name) {
+ mod_matches++;
+ matches++;
+ } else if (mod_name == "") {
+ print $2 " in " sym2mod[idx] " (should NOT be)";
+ mismatches++;
+ } else {
+ print $2 " in " sym2mod[idx] " (should be " mod_name ")";
+ mismatches++;
+ }
+ } else if (mod_name != "") {
+ print $2 " should be in " mod_name;
+ missing++;
+ } else
+ matches++;
+
+ total++;
+
+ next;
+}
+
+# Issue the comparison report.
+#
+END {
+ if (total) {
+ printf "Verification of %s:\n", ARGV[1];
+ printf " Correct matches: %6d (%d%% of total)\n", matches, 100 * matches / total;
+ printf " Module matches: %6d (%d%% of matches)\n", mod_matches, 100 * mod_matches / matches;
+ printf " Mismatches: %6d (%d%% of total)\n", mismatches, 100 * mismatches / total;
+ printf " Missing: %6d (%d%% of total)\n", missing, 100 * missing / total;
+ }
+}
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v7 4/4] module: add install target for modules.builtin.ranges
2024-08-21 4:06 [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
2024-08-21 4:06 ` [PATCH v7 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
2024-08-21 4:06 ` [PATCH v7 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
@ 2024-08-21 4:07 ` Kris Van Hees
2024-08-21 14:40 ` [PATCH v7 0/4] Generate address range data for built-in modules Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
4 siblings, 0 replies; 17+ messages in thread
From: Kris Van Hees @ 2024-08-21 4:07 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Nick Alcock, Masahiro Yamada, Steven Rostedt,
Luis Chamberlain, Masami Hiramatsu, Nick Desaulniers, Jiri Olsa,
Elena Zannoni
When CONFIG_BUILTIN_MODULE_RANGES is enabled, the modules.builtin.ranges
file should be installed in the module install location.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
---
Changes since v3:
- Only install modules.builtin.ranges if CONFIG_BUILTIN_MODULE_RANGES=y
---
scripts/Makefile.modinst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.modinst b/scripts/Makefile.modinst
index 0afd75472679..c38bf63a33be 100644
--- a/scripts/Makefile.modinst
+++ b/scripts/Makefile.modinst
@@ -30,10 +30,12 @@ $(MODLIB)/modules.order: modules.order FORCE
quiet_cmd_install_modorder = INSTALL $@
cmd_install_modorder = sed 's:^\(.*\)\.o$$:kernel/\1.ko:' $< > $@
-# Install modules.builtin(.modinfo) even when CONFIG_MODULES is disabled.
+# Install modules.builtin(.modinfo,.ranges) even when CONFIG_MODULES is disabled.
install-y += $(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo)
-$(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo): $(MODLIB)/%: % FORCE
+install-$(CONFIG_BUILTIN_MODULE_RANGES) += $(MODLIB)/modules.builtin.ranges
+
+$(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo modules.builtin.ranges): $(MODLIB)/%: % FORCE
$(call cmd,install)
endif
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v7 0/4] Generate address range data for built-in modules
2024-08-21 4:06 [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
` (2 preceding siblings ...)
2024-08-21 4:07 ` [PATCH v7 4/4] module: add install target for modules.builtin.ranges Kris Van Hees
@ 2024-08-21 14:40 ` Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
4 siblings, 0 replies; 17+ messages in thread
From: Kris Van Hees @ 2024-08-21 14:40 UTC (permalink / raw)
To: Kris Van Hees, linux-kernel, linux-kbuild, linux-modules,
linux-trace-kernel
Cc: Masahiro Yamada, Steven Rostedt, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
At build time, create the file modules.builtin.ranges that will hold
address range data of the built-in modules that can be used by tracers.
Especially for tracing applications, it is convenient to be able to
refer to a symbol using a <module name, symbol name> pair and to be able
to translate an address into a <nodule mname, symbol name> pair. But
that does not work if the module is built into the kernel because the
object files that comprise the built-in module implementation are simply
linked into the kernel image along with all other kernel object files.
This is especially visible when providing tracing scripts for support
purposes, where the developer of the script targets a particular kernel
version, but does not have control over whether the target system has
a particular module as loadable module or built-in module. When tracing
symbols within a module, referring them by <module name, symbol name>
pairs is both convenient and aids symbol lookup. But that naming will
not work if the module name information is lost if the module is built
into the kernel on the target system.
Earlier work addressing this loss of information for built-in modules
involved adding module name information to the kallsyms data, but that
required more invasive code in the kernel proper. This work never did
get merged into the kernel tree.
All that is really needed is knowing whether a given address belongs to
a particular module (or multiple modules if they share an object file).
Or in other words, whether that address falls within an address range
that is associated with one or more modules.
Objects can be identified as belonging to a particular module (or
modules) based on defines that are passed as flags to their respective
compilation commands. The data found in modules.builtin is used to
determine what modules are built into the kernel proper. Then,
vmlinux.o.map and vmlinux.map can be parsed in a single pass to generate
a modules.buitin.ranges file with offset range information (relative to
the base address of the associated section) for built-in modules. This
file gets installed along with the other modules.builtin.* files.
The impact on the kernel build is minimal because everything is done
using a single-pass AWK script. The generated data size is minimal as
well, (depending on the exact kernel configuration) usually in the range
of 500-700 lines, with a file size of 20-40KB (if all modules are built
in, the file contains about 8000 lines, with a file size of about 285KB).
Changes since v6:
- Applied Masahiro Yamada's patches for kconfig, makefile, and scripts.
Changes since v5:
- More improved commit descriptions to explain the why and how.
- Removed unnecessary compatibility info from option description.
- Added optional 6th arg to verifier to specify kernel build directory.
- Report error and exit from verifier if .*.o.cmd files cannot be read.
Changes since v4:
- Improved commit descriptions to explain the why and how.
- Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
- Improved comments in generate_builtin_ranges.awk
- Improved logic in generate_builtin_ranges.awk to handle incorrect
object size information in linker maps
- Added verify_builtin_ranges.awk
Changes since v3:
- Consolidated patches 2 through 5 into a single patch
- Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
- Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
- Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
- Support LLVM (lld) compiles in generate_builtin_ranges.awk
- Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
- Only install modules.builtin.ranges if CONFIG_BUILTIN_MODULE_RANGES=y
Changes since v2:
- Switched from using modules.builtin.objs to parsing .*.cmd files
- Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
- 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
- Parse data from .*.cmd in generate_builtin_ranges.awk
- Use $(real-prereqs) rather than $(filter-out ...)
- Include modules.builtin.ranges in modules install target
Changes since v1:
- Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
- Moved the config option to the tracers section
- 2nd arg to generate_builtin_ranges.awk should be vmlinux.map
Kris Van Hees (5):
trace: add CONFIG_BUILTIN_MODULE_RANGES option
kbuild: generate a linker map for vmlinux.o
module: script to generate offset ranges for builtin modules
kbuild: generate modules.builtin.ranges when linking the kernel
module: add install target for modules.builtin.ranges
Luis Chamberlain (1):
kbuild: add modules.builtin.objs
.gitignore | 2 +-
Documentation/dontdiff | 2 +-
Documentation/kbuild/kbuild.rst | 5 ++
Makefile | 8 +-
include/linux/module.h | 4 +-
kernel/trace/Kconfig | 17 ++++
scripts/Makefile.lib | 5 +-
scripts/Makefile.modinst | 11 ++-
scripts/Makefile.vmlinux | 17 ++++
scripts/Makefile.vmlinux_o | 18 ++++-
scripts/generate_builtin_ranges.awk | 149 ++++++++++++++++++++++++++++++++++++
11 files changed, 228 insertions(+), 10 deletions(-)
create mode 100755 scripts/generate_builtin_ranges.awk
base-commit: dd5a440a31fae6e459c0d6271dddd62825505361
--
2.42.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 2/4] kbuild: generate offset range data for builtin modules
2024-08-21 4:06 ` [PATCH v7 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
@ 2024-08-22 17:34 ` Masahiro Yamada
0 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2024-08-22 17:34 UTC (permalink / raw)
To: Kris Van Hees
Cc: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel,
Nick Alcock, Alan Maguire, Steven Rostedt, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
On Wed, Aug 21, 2024 at 1:11 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> Create file module.builtin.ranges that can be used to find where
> built-in modules are located by their addresses. This will be useful for
> tracing tools to find what functions are for various built-in modules.
>
> The offset range data for builtin modules is generated using:
> - modules.builtin: associates object files with module names
> - vmlinux.map: provides load order of sections and offset of first member
> per section
> - vmlinux.o.map: provides offset of object file content per section
> - .*.cmd: build cmd file with KBUILD_MODFILE
>
> The generated data will look like:
>
> .text 00000000-00000000 = _text
> .text 0000baf0-0000cb10 amd_uncore
> .text 0009bd10-0009c8e0 iosf_mbi
> ...
> .text 00b9f080-00ba011a intel_skl_int3472_discrete
> .text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
> .text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
> ...
> .data 00000000-00000000 = _sdata
> .data 0000f020-0000f680 amd_uncore
>
> For each ELF section, it lists the offset of the first symbol. This can
> be used to determine the base address of the section at runtime.
>
> Next, it lists (in strict ascending order) offset ranges in that section
> that cover the symbols of one or more builtin modules. Multiple ranges
> can apply to a single module, and ranges can be shared between modules.
>
> The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
> is generated for kernel modules that are built into the kernel image.
>
> How it works:
>
> 1. The modules.builtin file is parsed to obtain a list of built-in
> module names and their associated object names (the .ko file that
> the module would be in if it were a loadable module, hereafter
> referred to as <kmodfile>). This object name can be used to
> identify objects in the kernel compile because any C or assembler
> code that ends up into a built-in module will have the option
> -DKBUILD_MODFILE=<kmodfile> present in its build command, and those
> can be found in the .<obj>.cmd file in the kernel build tree.
>
> If an object is part of multiple modules, they will all be listed
> in the KBUILD_MODFILE option argument.
>
> This allows us to conclusively determine whether an object in the
> kernel build belong to any modules, and which.
>
> 2. The vmlinux.map is parsed next to determine the base address of each
> top level section so that all addresses into the section can be
> turned into offsets. This makes it possible to handle sections
> getting loaded at different addresses at system boot.
>
> We also determine an 'anchor' symbol at the beginning of each
> section to make it possible to calculate the true base address of
> a section at runtime (i.e. symbol address - symbol offset).
>
> We collect start addresses of sections that are included in the top
> level section. This is used when vmlinux is linked using vmlinux.o,
> because in that case, we need to look at the vmlinux.o linker map to
> know what object a symbol is found in.
>
> And finally, we process each symbol that is listed in vmlinux.map
> (or vmlinux.o.map) based on the following structure:
>
> vmlinux linked from vmlinux.a:
>
> vmlinux.map:
> <top level section>
> <included section> -- might be same as top level section)
> <object> -- built-in association known
> <symbol> -- belongs to module(s) object belongs to
> ...
>
> vmlinux linked from vmlinux.o:
>
> vmlinux.map:
> <top level section>
> <included section> -- might be same as top level section)
> vmlinux.o -- need to use vmlinux.o.map
> <symbol> -- ignored
> ...
>
> vmlinux.o.map:
> <section>
> <object> -- built-in association known
> <symbol> -- belongs to module(s) object belongs to
> ...
>
> 3. As sections, objects, and symbols are processed, offset ranges are
> constructed in a striaght-forward way:
>
> - If the symbol belongs to one or more built-in modules:
> - If we were working on the same module(s), extend the range
> to include this object
> - If we were working on another module(s), close that range,
> and start the new one
> - If the symbol does not belong to any built-in modules:
> - If we were working on a module(s) range, close that range
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> Changes since v6:
> - Applied Masahiro Yamada's suggestions (Kconfig, makefile, script).
>
> Changes since v5:
> - Removed unnecessary compatibility info from option description.
>
> Changes since v4:
> - Improved commit description to explain the why and how.
> - Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
> - Improved comments in generate_builtin_ranges.awk
> - Improved logic in generate_builtin_ranges.awk to handle incorrect
> object size information in linker maps
>
> Changes since v3:
> - Consolidated patches 2 through 5 into a single patch
> - Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
> - Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
> - Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
> - Support LLVM (lld) compiles in generate_builtin_ranges.awk
> - Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
>
> Changes since v2:
> - Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
> - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
> - Switched from using modules.builtin.objs to parsing .*.cmd files
> - Parse data from .*.cmd in generate_builtin_ranges.awk
> - Use $(real-prereqs) rather than $(filter-out ...)
> ---
> Documentation/process/changes.rst | 7 +
> lib/Kconfig.debug | 16 +
> scripts/Makefile.vmlinux | 18 +
> scripts/Makefile.vmlinux_o | 3 +
> scripts/generate_builtin_ranges.awk | 506 ++++++++++++++++++++++++++++
> 5 files changed, 550 insertions(+)
> create mode 100755 scripts/generate_builtin_ranges.awk
>
> diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
> index 3fc63f27c226..00f1ed7c59c3 100644
> --- a/Documentation/process/changes.rst
> +++ b/Documentation/process/changes.rst
> @@ -64,6 +64,7 @@ GNU tar 1.28 tar --version
> gtags (optional) 6.6.5 gtags --version
> mkimage (optional) 2017.01 mkimage --version
> Python (optional) 3.5.x python3 --version
> +GNU AWK (optional) 5.1.0 gawk --version
> ====================== =============== ========================================
>
> .. [#f1] Sphinx is needed only to build the Kernel documentation
> @@ -192,6 +193,12 @@ platforms. The tool is available via the ``u-boot-tools`` package or can be
> built from the U-Boot source code. See the instructions at
> https://docs.u-boot.org/en/latest/build/tools.html#building-tools-for-linux
>
> +GNU AWK
> +-------
> +
> +GNU AWK is needed if you want kernel builds to generate address range data for
> +builtin modules (CONFIG_BUILTIN_MODULE_RANGES).
> +
> System utilities
> ****************
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index a30c03a66172..f087dc3da321 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -571,6 +571,22 @@ config VMLINUX_MAP
> pieces of code get eliminated with
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
>
> +config BUILTIN_MODULE_RANGES
> + bool "Generate address range information for builtin modules"
> + depends on !LTO_CLANG_FULL
> + depends on !LTO_CLANG_THIN
Forgot to mention this.
These two lines can be replaced with
depends on !LTO
> diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
> new file mode 100755
> index 000000000000..865cb7ac4970
> --- /dev/null
> +++ b/scripts/generate_builtin_ranges.awk
> @@ -0,0 +1,506 @@
> +#!/usr/bin/gawk -f
> +# SPDX-License-Identifier: GPL-2.0
> +# generate_builtin_ranges.awk: Generate address range data for builtin modules
> +# Written by Kris Van Hees <kris.van.hees@oracle.com>
> +#
> +# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
> +# vmlinux.o.map > modules.builtin.ranges
> +#
> +
> +# Return the module name(s) (if any) associated with the given object.
> +#
> +# If we have seen this object before, return information from the cache.
> +# Otherwise, retrieve it from the corresponding .cmd file.
> +#
> +function get_module_info(fn, mod, obj, s) {
> + if (fn in omod)
> + return omod[fn];
> +
> + if (match(fn, /\/[^/]+$/) == 0)
> + return "";
> +
> + obj = fn;
> + mod = "";
> + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> + if (getline s <fn == 1) {
> + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> + mod = substr(s, RSTART + 16, RLENGTH - 16);
> + gsub(/['"]/, "", mod);
> + }
> + }
> + close(fn);
> +
> + # A single module (common case) also reflects objects that are not part
> + # of a module. Some of those objects have names that are also a module
> + # name (e.g. core). We check the associated module file name, and if
> + # they do not match, the object is not part of a module.
> + if (mod !~ / /) {
> + if (!(mod in mods))
> + mod = "";
> + }
> +
> + gsub(/([^/ ]*\/)+/, "", mod);
> + gsub(/-/, "_", mod);
> +
> + # At this point, mod is a single (valid) module name, or a list of
> + # module names (that do not need validation).
> + omod[obj] = mod;
> + close(fn);
I still see the second close(fn).
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 3/4] scripts: add verifier script for builtin module range data
2024-08-21 4:06 ` [PATCH v7 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
@ 2024-08-22 17:35 ` Masahiro Yamada
0 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2024-08-22 17:35 UTC (permalink / raw)
To: Kris Van Hees
Cc: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel,
Nick Alcock, Alan Maguire, Steven Rostedt, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
On Wed, Aug 21, 2024 at 1:11 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> The modules.builtin.ranges offset range data for builtin modules is
> generated at compile time based on the list of built-in modules and
> the vmlinux.map and vmlinux.o.map linker maps. This data can be used
> to determine whether a symbol at a particular address belongs to
> module code that was configured to be compiled into the kernel proper
> as a built-in module (rather than as a standalone module).
>
> This patch adds a script that uses the generated modules.builtin.ranges
> data to annotate the symbols in the System.map with module names if
> their address falls within a range that belongs to one or more built-in
> modules.
>
> It then processes the vmlinux.map (and if needed, vmlinux.o.map) to
> verify the annotation:
>
> - For each top-level section:
> - For each object in the section:
> - Determine whether the object is part of a built-in module
> (using modules.builtin and the .*.cmd file used to compile
> the object as suggested in [0])
> - For each symbol in that object, verify that the built-in
> module association (or lack thereof) matches the annotation
> given to the symbol.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> Changes since v6:
> - Applied Masahiro Yamada's suggestions to the AWK script.
>
> Changes since v5:
> - Added optional 6th argument to specify kernel build directory.
> - Report error and exit if .*.o.cmd files cannot be read.
>
> Changes since v4:
> - New patch in the series
> ---
> scripts/verify_builtin_ranges.awk | 356 ++++++++++++++++++++++++++++++
> 1 file changed, 356 insertions(+)
> create mode 100755 scripts/verify_builtin_ranges.awk
>
> diff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk
> new file mode 100755
> index 000000000000..93f66e9a8802
> --- /dev/null
> +++ b/scripts/verify_builtin_ranges.awk
> @@ -0,0 +1,356 @@
> +#!/usr/bin/gawk -f
> +# SPDX-License-Identifier: GPL-2.0
> +# verify_builtin_ranges.awk: Verify address range data for builtin modules
> +# Written by Kris Van Hees <kris.van.hees@oracle.com>
> +#
> +# Usage: verify_builtin_ranges.awk modules.builtin.ranges System.map \
> +# modules.builtin vmlinux.map vmlinux.o.map \
> +# [ <build-dir> ]
> +#
> +
> +# Return the module name(s) (if any) associated with the given object.
> +#
> +# If we have seen this object before, return information from the cache.
> +# Otherwise, retrieve it from the corresponding .cmd file.
> +#
> +function get_module_info(fn, mod, obj, s) {
> + if (fn in omod)
> + return omod[fn];
> +
> + if (match(fn, /\/[^/]+$/) == 0)
> + return "";
> +
> + obj = fn;
> + mod = "";
> + fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> + if (getline s <fn == 1) {
> + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> + mod = substr(s, RSTART + 16, RLENGTH - 16);
> + gsub(/['"]/, "", mod);
> + }
> + } else {
> + print "ERROR: Failed to read: " fn "\n\n" \
> + " Invalid kernel build directory (" kdir ")\n" \
> + " or its content does not match " ARGV[1] >"/dev/stderr";
> + close(fn);
> + total = 0;
> + exit(1);
> + }
> + close(fn);
> +
> + # A single module (common case) also reflects objects that are not part
> + # of a module. Some of those objects have names that are also a module
> + # name (e.g. core). We check the associated module file name, and if
> + # they do not match, the object is not part of a module.
> + if (mod !~ / /) {
> + if (!(mod in mods))
> + mod = "";
> + }
> +
> + gsub(/([^/ ]*\/)+/, "", mod);
> + gsub(/-/, "_", mod);
> +
> + # At this point, mod is a single (valid) module name, or a list of
> + # module names (that do not need validation).
> + omod[obj] = mod;
> + close(fn);
Same as 2/4.
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v8 0/4] Generate address range data for built-in modules
2024-08-21 4:06 [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
` (3 preceding siblings ...)
2024-08-21 14:40 ` [PATCH v7 0/4] Generate address range data for built-in modules Kris Van Hees
@ 2024-08-22 18:19 ` Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
` (3 more replies)
4 siblings, 4 replies; 17+ messages in thread
From: Kris Van Hees @ 2024-08-22 18:19 UTC (permalink / raw)
To: Kris Van Hees, linux-kernel, linux-kbuild, linux-modules,
linux-trace-kernel
Cc: Masahiro Yamada, Steven Rostedt, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
At build time, create the file modules.builtin.ranges that will hold
address range data of the built-in modules that can be used by tracers.
Especially for tracing applications, it is convenient to be able to
refer to a symbol using a <module name, symbol name> pair and to be able
to translate an address into a <nodule mname, symbol name> pair. But
that does not work if the module is built into the kernel because the
object files that comprise the built-in module implementation are simply
linked into the kernel image along with all other kernel object files.
This is especially visible when providing tracing scripts for support
purposes, where the developer of the script targets a particular kernel
version, but does not have control over whether the target system has
a particular module as loadable module or built-in module. When tracing
symbols within a module, referring them by <module name, symbol name>
pairs is both convenient and aids symbol lookup. But that naming will
not work if the module name information is lost if the module is built
into the kernel on the target system.
Earlier work addressing this loss of information for built-in modules
involved adding module name information to the kallsyms data, but that
required more invasive code in the kernel proper. This work never did
get merged into the kernel tree.
All that is really needed is knowing whether a given address belongs to
a particular module (or multiple modules if they share an object file).
Or in other words, whether that address falls within an address range
that is associated with one or more modules.
Objects can be identified as belonging to a particular module (or
modules) based on defines that are passed as flags to their respective
compilation commands. The data found in modules.builtin is used to
determine what modules are built into the kernel proper. Then,
vmlinux.o.map and vmlinux.map can be parsed in a single pass to generate
a modules.buitin.ranges file with offset range information (relative to
the base address of the associated section) for built-in modules. This
file gets installed along with the other modules.builtin.* files.
The impact on the kernel build is minimal because everything is done
using a single-pass AWK script. The generated data size is minimal as
well, (depending on the exact kernel configuration) usually in the range
of 500-700 lines, with a file size of 20-40KB (if all modules are built
in, the file contains about 8000 lines, with a file size of about 285KB).
Changes since v7:
- Remove extra close(fn) in scripts.
- Make CONFIG_BUILTIN_MODULE_RANGES depend on !LTO.
Changes since v6:
- Applied Masahiro Yamada's patches for kconfig, makefile, and scripts.
Changes since v5:
- More improved commit descriptions to explain the why and how.
- Removed unnecessary compatibility info from option description.
- Added optional 6th arg to verifier to specify kernel build directory.
- Report error and exit from verifier if .*.o.cmd files cannot be read.
Changes since v4:
- Improved commit descriptions to explain the why and how.
- Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
- Improved comments in generate_builtin_ranges.awk
- Improved logic in generate_builtin_ranges.awk to handle incorrect
object size information in linker maps
- Added verify_builtin_ranges.awk
Changes since v3:
- Consolidated patches 2 through 5 into a single patch
- Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
- Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
- Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
- Support LLVM (lld) compiles in generate_builtin_ranges.awk
- Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
- Only install modules.builtin.ranges if CONFIG_BUILTIN_MODULE_RANGES=y
Changes since v2:
- Switched from using modules.builtin.objs to parsing .*.cmd files
- Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
- 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
- Parse data from .*.cmd in generate_builtin_ranges.awk
- Use $(real-prereqs) rather than $(filter-out ...)
- Include modules.builtin.ranges in modules install target
Changes since v1:
- Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
- Moved the config option to the tracers section
- 2nd arg to generate_builtin_ranges.awk should be vmlinux.map
Kris Van Hees (5):
trace: add CONFIG_BUILTIN_MODULE_RANGES option
kbuild: generate a linker map for vmlinux.o
module: script to generate offset ranges for builtin modules
kbuild: generate modules.builtin.ranges when linking the kernel
module: add install target for modules.builtin.ranges
Luis Chamberlain (1):
kbuild: add modules.builtin.objs
.gitignore | 2 +-
Documentation/dontdiff | 2 +-
Documentation/kbuild/kbuild.rst | 5 ++
Makefile | 8 +-
include/linux/module.h | 4 +-
kernel/trace/Kconfig | 17 ++++
scripts/Makefile.lib | 5 +-
scripts/Makefile.modinst | 11 ++-
scripts/Makefile.vmlinux | 17 ++++
scripts/Makefile.vmlinux_o | 18 ++++-
scripts/generate_builtin_ranges.awk | 149 ++++++++++++++++++++++++++++++++++++
11 files changed, 228 insertions(+), 10 deletions(-)
create mode 100755 scripts/generate_builtin_ranges.awk
base-commit: dd5a440a31fae6e459c0d6271dddd62825505361
--
2.42.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
@ 2024-08-22 18:19 ` Kris Van Hees
2024-08-23 17:37 ` Masahiro Yamada
2024-08-22 18:19 ` [PATCH v8 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Kris Van Hees @ 2024-08-22 18:19 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Steven Rostedt, Masahiro Yamada, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
In order to create the file at build time, modules.builtin.ranges, that
contains the range of addresses for all built-in modules, there needs to
be a way to identify what code is compiled into modules.
To identify what code is compiled into modules during a kernel build,
one can look for the presence of the -DKBUILD_MODFILE and -DKBUILD_MODNAME
options in the compile command lines. A simple grep in .*.cmd files for
those options is sufficient for this.
Unfortunately, these options are only passed when compiling C source files.
Various modules also include objects built from assembler source, and these
options are not passed in that case.
Adding $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and
adding $(modname_flags) to a_flags (similar to c_flags) makes it possible
to identify which objects are compiled into modules for both C and
assembler soure files.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
scripts/Makefile.lib | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index fe3668dc4954..170f462537a8 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -238,7 +238,7 @@ modkern_rustflags = \
modkern_aflags = $(if $(part-of-module), \
$(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE), \
- $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL))
+ $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL) $(modfile_flags))
c_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
-include $(srctree)/include/linux/compiler_types.h \
@@ -248,7 +248,7 @@ c_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
rust_flags = $(_rust_flags) $(modkern_rustflags) @$(objtree)/include/generated/rustc_cfg
a_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
- $(_a_flags) $(modkern_aflags)
+ $(_a_flags) $(modkern_aflags) $(modname_flags)
cpp_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
$(_cpp_flags)
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 2/4] kbuild: generate offset range data for builtin modules
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
@ 2024-08-22 18:19 ` Kris Van Hees
2024-08-23 16:53 ` Sami Tolvanen
2024-08-22 18:19 ` [PATCH v8 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 4/4] module: add install target for modules.builtin.ranges Kris Van Hees
3 siblings, 1 reply; 17+ messages in thread
From: Kris Van Hees @ 2024-08-22 18:19 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Nick Alcock, Alan Maguire, Steven Rostedt,
Masahiro Yamada, Luis Chamberlain, Masami Hiramatsu,
Nick Desaulniers, Jiri Olsa, Elena Zannoni
Create file module.builtin.ranges that can be used to find where
built-in modules are located by their addresses. This will be useful for
tracing tools to find what functions are for various built-in modules.
The offset range data for builtin modules is generated using:
- modules.builtin: associates object files with module names
- vmlinux.map: provides load order of sections and offset of first member
per section
- vmlinux.o.map: provides offset of object file content per section
- .*.cmd: build cmd file with KBUILD_MODFILE
The generated data will look like:
.text 00000000-00000000 = _text
.text 0000baf0-0000cb10 amd_uncore
.text 0009bd10-0009c8e0 iosf_mbi
...
.text 00b9f080-00ba011a intel_skl_int3472_discrete
.text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
.text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
...
.data 00000000-00000000 = _sdata
.data 0000f020-0000f680 amd_uncore
For each ELF section, it lists the offset of the first symbol. This can
be used to determine the base address of the section at runtime.
Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules. Multiple ranges
can apply to a single module, and ranges can be shared between modules.
The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
is generated for kernel modules that are built into the kernel image.
How it works:
1. The modules.builtin file is parsed to obtain a list of built-in
module names and their associated object names (the .ko file that
the module would be in if it were a loadable module, hereafter
referred to as <kmodfile>). This object name can be used to
identify objects in the kernel compile because any C or assembler
code that ends up into a built-in module will have the option
-DKBUILD_MODFILE=<kmodfile> present in its build command, and those
can be found in the .<obj>.cmd file in the kernel build tree.
If an object is part of multiple modules, they will all be listed
in the KBUILD_MODFILE option argument.
This allows us to conclusively determine whether an object in the
kernel build belong to any modules, and which.
2. The vmlinux.map is parsed next to determine the base address of each
top level section so that all addresses into the section can be
turned into offsets. This makes it possible to handle sections
getting loaded at different addresses at system boot.
We also determine an 'anchor' symbol at the beginning of each
section to make it possible to calculate the true base address of
a section at runtime (i.e. symbol address - symbol offset).
We collect start addresses of sections that are included in the top
level section. This is used when vmlinux is linked using vmlinux.o,
because in that case, we need to look at the vmlinux.o linker map to
know what object a symbol is found in.
And finally, we process each symbol that is listed in vmlinux.map
(or vmlinux.o.map) based on the following structure:
vmlinux linked from vmlinux.a:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
vmlinux linked from vmlinux.o:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
vmlinux.o -- need to use vmlinux.o.map
<symbol> -- ignored
...
vmlinux.o.map:
<section>
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
3. As sections, objects, and symbols are processed, offset ranges are
constructed in a striaght-forward way:
- If the symbol belongs to one or more built-in modules:
- If we were working on the same module(s), extend the range
to include this object
- If we were working on another module(s), close that range,
and start the new one
- If the symbol does not belong to any built-in modules:
- If we were working on a module(s) range, close that range
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v7:
- Removed extra close(fn).
- Make CONFIG_BUILTIN_MODULE_RANGES depend on !lTO.
Changes since v6:
- Applied Masahiro Yamada's suggestions (Kconfig, makefile, script).
Changes since v5:
- Removed unnecessary compatibility info from option description.
Changes since v4:
- Improved commit description to explain the why and how.
- Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
- Improved comments in generate_builtin_ranges.awk
- Improved logic in generate_builtin_ranges.awk to handle incorrect
object size information in linker maps
Changes since v3:
- Consolidated patches 2 through 5 into a single patch
- Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
- Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
- Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
- Support LLVM (lld) compiles in generate_builtin_ranges.awk
- Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
Changes since v2:
- Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
- 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
- Switched from using modules.builtin.objs to parsing .*.cmd files
- Parse data from .*.cmd in generate_builtin_ranges.awk
- Use $(real-prereqs) rather than $(filter-out ...)
---
Documentation/process/changes.rst | 7 +
lib/Kconfig.debug | 15 +
scripts/Makefile.vmlinux | 18 +
scripts/Makefile.vmlinux_o | 3 +
scripts/generate_builtin_ranges.awk | 505 ++++++++++++++++++++++++++++
5 files changed, 548 insertions(+)
create mode 100755 scripts/generate_builtin_ranges.awk
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 3fc63f27c226..00f1ed7c59c3 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -64,6 +64,7 @@ GNU tar 1.28 tar --version
gtags (optional) 6.6.5 gtags --version
mkimage (optional) 2017.01 mkimage --version
Python (optional) 3.5.x python3 --version
+GNU AWK (optional) 5.1.0 gawk --version
====================== =============== ========================================
.. [#f1] Sphinx is needed only to build the Kernel documentation
@@ -192,6 +193,12 @@ platforms. The tool is available via the ``u-boot-tools`` package or can be
built from the U-Boot source code. See the instructions at
https://docs.u-boot.org/en/latest/build/tools.html#building-tools-for-linux
+GNU AWK
+-------
+
+GNU AWK is needed if you want kernel builds to generate address range data for
+builtin modules (CONFIG_BUILTIN_MODULE_RANGES).
+
System utilities
****************
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a30c03a66172..5e2f30921cb2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -571,6 +571,21 @@ config VMLINUX_MAP
pieces of code get eliminated with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
+config BUILTIN_MODULE_RANGES
+ bool "Generate address range information for builtin modules"
+ depends on !LTO
+ depends on VMLINUX_MAP
+ help
+ When modules are built into the kernel, there will be no module name
+ associated with its symbols in /proc/kallsyms. Tracers may want to
+ identify symbols by module name and symbol name regardless of whether
+ the module is configured as loadable or not.
+
+ This option generates modules.builtin.ranges in the build tree with
+ offset ranges (per ELF section) for the module(s) they belong to.
+ It also records an anchor symbol to determine the load address of the
+ section.
+
config DEBUG_FORCE_WEAK_PER_CPU
bool "Force weak per-cpu definitions"
depends on DEBUG_KERNEL
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..7e8b703799c8 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -33,6 +33,24 @@ targets += vmlinux
vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
+# module.builtin.ranges
+# ---------------------------------------------------------------------------
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+ cmd_modules_builtin_ranges = $(real-prereqs) > $@
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: $(srctree)/scripts/generate_builtin_ranges.awk \
+ modules.builtin vmlinux.map vmlinux.o.map FORCE
+ $(call if_changed,modules_builtin_ranges)
+
+vmlinux.map: vmlinux
+ @:
+
+endif
+
# Add FORCE to the prequisites of a target to force it to be always rebuilt.
# ---------------------------------------------------------------------------
diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
index 6de297916ce6..252505505e0e 100644
--- a/scripts/Makefile.vmlinux_o
+++ b/scripts/Makefile.vmlinux_o
@@ -45,9 +45,12 @@ objtool-args = $(vmlinux-objtool-args-y) --link
# Link of vmlinux.o used for section mismatch analysis
# ---------------------------------------------------------------------------
+vmlinux-o-ld-args-$(CONFIG_BUILTIN_MODULE_RANGES) += -Map=$@.map
+
quiet_cmd_ld_vmlinux.o = LD $@
cmd_ld_vmlinux.o = \
$(LD) ${KBUILD_LDFLAGS} -r -o $@ \
+ $(vmlinux-o-ld-args-y) \
$(addprefix -T , $(initcalls-lds)) \
--whole-archive vmlinux.a --no-whole-archive \
--start-group $(KBUILD_VMLINUX_LIBS) --end-group \
diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
new file mode 100755
index 000000000000..68df05fd3036
--- /dev/null
+++ b/scripts/generate_builtin_ranges.awk
@@ -0,0 +1,505 @@
+#!/usr/bin/gawk -f
+# SPDX-License-Identifier: GPL-2.0
+# generate_builtin_ranges.awk: Generate address range data for builtin modules
+# Written by Kris Van Hees <kris.van.hees@oracle.com>
+#
+# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
+# vmlinux.o.map > modules.builtin.ranges
+#
+
+# Return the module name(s) (if any) associated with the given object.
+#
+# If we have seen this object before, return information from the cache.
+# Otherwise, retrieve it from the corresponding .cmd file.
+#
+function get_module_info(fn, mod, obj, s) {
+ if (fn in omod)
+ return omod[fn];
+
+ if (match(fn, /\/[^/]+$/) == 0)
+ return "";
+
+ obj = fn;
+ mod = "";
+ fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
+ if (getline s <fn == 1) {
+ if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
+ mod = substr(s, RSTART + 16, RLENGTH - 16);
+ gsub(/['"]/, "", mod);
+ }
+ }
+ close(fn);
+
+ # A single module (common case) also reflects objects that are not part
+ # of a module. Some of those objects have names that are also a module
+ # name (e.g. core). We check the associated module file name, and if
+ # they do not match, the object is not part of a module.
+ if (mod !~ / /) {
+ if (!(mod in mods))
+ mod = "";
+ }
+
+ gsub(/([^/ ]*\/)+/, "", mod);
+ gsub(/-/, "_", mod);
+
+ # At this point, mod is a single (valid) module name, or a list of
+ # module names (that do not need validation).
+ omod[obj] = mod;
+
+ return mod;
+}
+
+# Update the ranges entry for the given module 'mod' in section 'osect'.
+#
+# We use a modified absolute start address (soff + base) as index because we
+# may need to insert an anchor record later that must be at the start of the
+# section data, and the first module may very well start at the same address.
+# So, we use (addr << 1) + 1 to allow a possible anchor record to be placed at
+# (addr << 1). This is safe because the index is only used to sort the entries
+# before writing them out.
+#
+function update_entry(osect, mod, soff, eoff, sect, idx) {
+ sect = sect_in[osect];
+ idx = (soff + sect_base[osect]) * 2 + 1;
+ entries[idx] = sprintf("%s %08x-%08x %s", sect, soff, eoff, mod);
+ count[sect]++;
+}
+
+# (1) Build a lookup map of built-in module names.
+#
+# The first file argument is used as input (modules.builtin).
+#
+# Lines will be like:
+# kernel/crypto/lzo-rle.ko
+# and we record the object name "crypto/lzo-rle".
+#
+ARGIND == 1 {
+ sub(/kernel\//, ""); # strip off "kernel/" prefix
+ sub(/\.ko$/, ""); # strip off .ko suffix
+
+ mods[$1] = 1;
+ next;
+}
+
+# (2) Collect address information for each section.
+#
+# The second file argument is used as input (vmlinux.map).
+#
+# We collect the base address of the section in order to convert all addresses
+# in the section into offset values.
+#
+# We collect the address of the anchor (or first symbol in the section if there
+# is no explicit anchor) to allow users of the range data to calculate address
+# ranges based on the actual load address of the section in the running kernel.
+#
+# We collect the start address of any sub-section (section included in the top
+# level section being processed). This is needed when the final linking was
+# done using vmlinux.a because then the list of objects contained in each
+# section is to be obtained from vmlinux.o.map. The offset of the sub-section
+# is recorded here, to be used as an addend when processing vmlinux.o.map
+# later.
+#
+
+# Both GNU ld and LLVM lld linker map format are supported by converting LLVM
+# lld linker map records into equivalent GNU ld linker map records.
+#
+# The first record of the vmlinux.map file provides enough information to know
+# which format we are dealing with.
+#
+ARGIND == 2 && FNR == 1 && NF == 7 && $1 == "VMA" && $7 == "Symbol" {
+ map_is_lld = 1;
+ if (dbg)
+ printf "NOTE: %s uses LLVM lld linker map format\n", FILENAME >"/dev/stderr";
+ next;
+}
+
+# (LLD) Convert a section record fronm lld format to ld format.
+#
+# lld: ffffffff82c00000 2c00000 2493c0 8192 .data
+# ->
+# ld: .data 0xffffffff82c00000 0x2493c0 load address 0x0000000002c00000
+#
+ARGIND == 2 && map_is_lld && NF == 5 && /[0-9] [^ ]+$/ {
+ $0 = $5 " 0x"$1 " 0x"$3 " load address 0x"$2;
+}
+
+# (LLD) Convert an anchor record from lld format to ld format.
+#
+# lld: ffffffff81000000 1000000 0 1 _text = .
+# ->
+# ld: 0xffffffff81000000 _text = .
+#
+ARGIND == 2 && map_is_lld && !anchor && NF == 7 && raw_addr == "0x"$1 && $6 == "=" && $7 == "." {
+ $0 = " 0x"$1 " " $5 " = .";
+}
+
+# (LLD) Convert an object record from lld format to ld format.
+#
+# lld: 11480 11480 1f07 16 vmlinux.a(arch/x86/events/amd/uncore.o):(.text)
+# ->
+# ld: .text 0x0000000000011480 0x1f07 arch/x86/events/amd/uncore.o
+#
+ARGIND == 2 && map_is_lld && NF == 5 && $5 ~ /:\(/ {
+ gsub(/\)/, "");
+ sub(/ vmlinux\.a\(/, " ");
+ sub(/:\(/, " ");
+ $0 = " "$6 " 0x"$1 " 0x"$3 " " $5;
+}
+
+# (LLD) Convert a symbol record from lld format to ld format.
+#
+# We only care about these while processing a section for which no anchor has
+# been determined yet.
+#
+# lld: ffffffff82a859a4 2a859a4 0 1 btf_ksym_iter_id
+# ->
+# ld: 0xffffffff82a859a4 btf_ksym_iter_id
+#
+ARGIND == 2 && map_is_lld && sect && !anchor && NF == 5 && $5 ~ /^[_A-Za-z][_A-Za-z0-9]*$/ {
+ $0 = " 0x"$1 " " $5;
+}
+
+# (LLD) We do not need any other ldd linker map records.
+#
+ARGIND == 2 && map_is_lld && /^[0-9a-f]{16} / {
+ next;
+}
+
+# (LD) Section records with just the section name at the start of the line
+# need to have the next line pulled in to determine whether it is a
+# loadable section. If it is, the next line will contains a hex value
+# as first and second items.
+#
+ARGIND == 2 && !map_is_lld && NF == 1 && /^[^ ]/ {
+ s = $0;
+ getline;
+ if ($1 !~ /^0x/ || $2 !~ /^0x/)
+ next;
+
+ $0 = s " " $0;
+}
+
+# (LD) Object records with just the section name denote records with a long
+# section name for which the remainder of the record can be found on the
+# next line.
+#
+# (This is also needed for vmlinux.o.map, when used.)
+#
+ARGIND >= 2 && !map_is_lld && NF == 1 && /^ [^ \*]/ {
+ s = $0;
+ getline;
+ $0 = s " " $0;
+}
+
+# Beginning a new section - done with the previous one (if any).
+#
+ARGIND == 2 && /^[^ ]/ {
+ sect = 0;
+}
+
+# Process a loadable section (we only care about .-sections).
+#
+# Record the section name and its base address.
+# We also record the raw (non-stripped) address of the section because it can
+# be used to identify an anchor record.
+#
+# Note:
+# Since some AWK implementations cannot handle large integers, we strip off the
+# first 4 hex digits from the address. This is safe because the kernel space
+# is not large enough for addresses to extend into those digits. The portion
+# to strip off is stored in addr_prefix as a regexp, so further clauses can
+# perform a simple substitution to do the address stripping.
+#
+ARGIND == 2 && /^\./ {
+ # Explicitly ignore a few sections that are not relevant here.
+ if ($1 ~ /^\.orc_/ || $1 ~ /_sites$/ || $1 ~ /\.percpu/)
+ next;
+
+ # Sections with a 0-address can be ignored as well.
+ if ($2 ~ /^0x0+$/)
+ next;
+
+ raw_addr = $2;
+ addr_prefix = "^" substr($2, 1, 6);
+ base = $2;
+ sub(addr_prefix, "0x", base);
+ base = strtonum(base);
+ sect = $1;
+ anchor = 0;
+ sect_base[sect] = base;
+ sect_size[sect] = strtonum($3);
+
+ if (dbg)
+ printf "[%s] BASE %016x\n", sect, base >"/dev/stderr";
+
+ next;
+}
+
+# If we are not in a section we care about, we ignore the record.
+#
+ARGIND == 2 && !sect {
+ next;
+}
+
+# Record the first anchor symbol for the current section.
+#
+# An anchor record for the section bears the same raw address as the section
+# record.
+#
+ARGIND == 2 && !anchor && NF == 4 && raw_addr == $1 && $3 == "=" && $4 == "." {
+ anchor = sprintf("%s %08x-%08x = %s", sect, 0, 0, $2);
+ sect_anchor[sect] = anchor;
+
+ if (dbg)
+ printf "[%s] ANCHOR %016x = %s (.)\n", sect, 0, $2 >"/dev/stderr";
+
+ next;
+}
+
+# If no anchor record was found for the current section, use the first symbol
+# in the section as anchor.
+#
+ARGIND == 2 && !anchor && NF == 2 && $1 ~ /^0x/ && $2 !~ /^0x/ {
+ addr = $1;
+ sub(addr_prefix, "0x", addr);
+ addr = strtonum(addr) - base;
+ anchor = sprintf("%s %08x-%08x = %s", sect, addr, addr, $2);
+ sect_anchor[sect] = anchor;
+
+ if (dbg)
+ printf "[%s] ANCHOR %016x = %s\n", sect, addr, $2 >"/dev/stderr";
+
+ next;
+}
+
+# The first occurence of a section name in an object record establishes the
+# addend (often 0) for that section. This information is needed to handle
+# sections that get combined in the final linking of vmlinux (e.g. .head.text
+# getting included at the start of .text).
+#
+# If the section does not have a base yet, use the base of the encapsulating
+# section.
+#
+ARGIND == 2 && sect && NF == 4 && /^ [^ \*]/ && !($1 in sect_addend) {
+ if (!($1 in sect_base)) {
+ sect_base[$1] = base;
+
+ if (dbg)
+ printf "[%s] BASE %016x\n", $1, base >"/dev/stderr";
+ }
+
+ addr = $2;
+ sub(addr_prefix, "0x", addr);
+ addr = strtonum(addr);
+ sect_addend[$1] = addr - sect_base[$1];
+ sect_in[$1] = sect;
+
+ if (dbg)
+ printf "[%s] ADDEND %016x - %016x = %016x\n", $1, addr, base, sect_addend[$1] >"/dev/stderr";
+
+ # If the object is vmlinux.o then we will need vmlinux.o.map to get the
+ # actual offsets of objects.
+ if ($4 == "vmlinux.o")
+ need_o_map = 1;
+}
+
+# (3) Collect offset ranges (relative to the section base address) for built-in
+# modules.
+#
+# If the final link was done using the actual objects, vmlinux.map contains all
+# the information we need (see section (3a)).
+# If linking was done using vmlinux.a as intermediary, we will need to process
+# vmlinux.o.map (see section (3b)).
+
+# (3a) Determine offset range info using vmlinux.map.
+#
+# Since we are already processing vmlinux.map, the top level section that is
+# being processed is already known. If we do not have a base address for it,
+# we do not need to process records for it.
+#
+# Given the object name, we determine the module(s) (if any) that the current
+# object is associated with.
+#
+# If we were already processing objects for a (list of) module(s):
+# - If the current object belongs to the same module(s), update the range data
+# to include the current object.
+# - Otherwise, ensure that the end offset of the range is valid.
+#
+# If the current object does not belong to a built-in module, ignore it.
+#
+# If it does, we add a new built-in module offset range record.
+#
+ARGIND == 2 && !need_o_map && /^ [^ ]/ && NF == 4 && $3 != "0x0" {
+ if (!(sect in sect_base))
+ next;
+
+ # Turn the address into an offset from the section base.
+ soff = $2;
+ sub(addr_prefix, "0x", soff);
+ soff = strtonum(soff) - sect_base[sect];
+ eoff = soff + strtonum($3);
+
+ # Determine which (if any) built-in modules the object belongs to.
+ mod = get_module_info($4);
+
+ # If we are processing a built-in module:
+ # - If the current object is within the same module, we update its
+ # entry by extending the range and move on
+ # - Otherwise:
+ # + If we are still processing within the same main section, we
+ # validate the end offset against the start offset of the
+ # current object (e.g. .rodata.str1.[18] objects are often
+ # listed with an incorrect size in the linker map)
+ # + Otherwise, we validate the end offset against the section
+ # size
+ if (mod_name) {
+ if (mod == mod_name) {
+ mod_eoff = eoff;
+ update_entry(mod_sect, mod_name, mod_soff, eoff);
+
+ next;
+ } else if (sect == sect_in[mod_sect]) {
+ if (mod_eoff > soff)
+ update_entry(mod_sect, mod_name, mod_soff, soff);
+ } else {
+ v = sect_size[sect_in[mod_sect]];
+ if (mod_eoff > v)
+ update_entry(mod_sect, mod_name, mod_soff, v);
+ }
+ }
+
+ mod_name = mod;
+
+ # If we encountered an object that is not part of a built-in module, we
+ # do not need to record any data.
+ if (!mod)
+ next;
+
+ # At this point, we encountered the start of a new built-in module.
+ mod_name = mod;
+ mod_soff = soff;
+ mod_eoff = eoff;
+ mod_sect = $1;
+ update_entry($1, mod, soff, mod_eoff);
+
+ next;
+}
+
+# If we do not need to parse the vmlinux.o.map file, we are done.
+#
+ARGIND == 3 && !need_o_map {
+ if (dbg)
+ printf "Note: %s is not needed.\n", FILENAME >"/dev/stderr";
+ exit;
+}
+
+# (3) Collect offset ranges (relative to the section base address) for built-in
+# modules.
+#
+
+# (LLD) Convert an object record from lld format to ld format.
+#
+ARGIND == 3 && map_is_lld && NF == 5 && $5 ~ /:\(/ {
+ gsub(/\)/, "");
+ sub(/:\(/, " ");
+
+ sect = $6;
+ if (!(sect in sect_addend))
+ next;
+
+ sub(/ vmlinux\.a\(/, " ");
+ $0 = " "sect " 0x"$1 " 0x"$3 " " $5;
+}
+
+# (3b) Determine offset range info using vmlinux.o.map.
+#
+# If we do not know an addend for the object's section, we are interested in
+# anything within that section.
+#
+# Determine the top-level section that the object's section was included in
+# during the final link. This is the section name offset range data will be
+# associated with for this object.
+#
+# The remainder of the processing of the current object record follows the
+# procedure outlined in (3a).
+#
+ARGIND == 3 && /^ [^ ]/ && NF == 4 && $3 != "0x0" {
+ osect = $1;
+ if (!(osect in sect_addend))
+ next;
+
+ # We need to work with the main section.
+ sect = sect_in[osect];
+
+ # Turn the address into an offset from the section base.
+ soff = $2;
+ sub(addr_prefix, "0x", soff);
+ soff = strtonum(soff) + sect_addend[osect];
+ eoff = soff + strtonum($3);
+
+ # Determine which (if any) built-in modules the object belongs to.
+ mod = get_module_info($4);
+
+ # If we are processing a built-in module:
+ # - If the current object is within the same module, we update its
+ # entry by extending the range and move on
+ # - Otherwise:
+ # + If we are still processing within the same main section, we
+ # validate the end offset against the start offset of the
+ # current object (e.g. .rodata.str1.[18] objects are often
+ # listed with an incorrect size in the linker map)
+ # + Otherwise, we validate the end offset against the section
+ # size
+ if (mod_name) {
+ if (mod == mod_name) {
+ mod_eoff = eoff;
+ update_entry(mod_sect, mod_name, mod_soff, eoff);
+
+ next;
+ } else if (sect == sect_in[mod_sect]) {
+ if (mod_eoff > soff)
+ update_entry(mod_sect, mod_name, mod_soff, soff);
+ } else {
+ v = sect_size[sect_in[mod_sect]];
+ if (mod_eoff > v)
+ update_entry(mod_sect, mod_name, mod_soff, v);
+ }
+ }
+
+ mod_name = mod;
+
+ # If we encountered an object that is not part of a built-in module, we
+ # do not need to record any data.
+ if (!mod)
+ next;
+
+ # At this point, we encountered the start of a new built-in module.
+ mod_name = mod;
+ mod_soff = soff;
+ mod_eoff = eoff;
+ mod_sect = osect;
+ update_entry(osect, mod, soff, mod_eoff);
+
+ next;
+}
+
+# (4) Generate the output.
+#
+# Anchor records are added for each section that contains offset range data
+# records. They are added at an adjusted section base address (base << 1) to
+# ensure they come first in the second records (see update_entry() above for
+# more informtion).
+#
+# All entries are sorted by (adjusted) address to ensure that the output can be
+# parsed in strict ascending address order.
+#
+END {
+ for (sect in count) {
+ if (sect in sect_anchor)
+ entries[sect_base[sect] * 2] = sect_anchor[sect];
+ }
+
+ n = asorti(entries, indices);
+ for (i = 1; i <= n; i++)
+ print entries[indices[i]];
+}
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 3/4] scripts: add verifier script for builtin module range data
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
@ 2024-08-22 18:19 ` Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 4/4] module: add install target for modules.builtin.ranges Kris Van Hees
3 siblings, 0 replies; 17+ messages in thread
From: Kris Van Hees @ 2024-08-22 18:19 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Nick Alcock, Alan Maguire, Masahiro Yamada,
Steven Rostedt, Luis Chamberlain, Masami Hiramatsu,
Nick Desaulniers, Jiri Olsa, Elena Zannoni
The modules.builtin.ranges offset range data for builtin modules is
generated at compile time based on the list of built-in modules and
the vmlinux.map and vmlinux.o.map linker maps. This data can be used
to determine whether a symbol at a particular address belongs to
module code that was configured to be compiled into the kernel proper
as a built-in module (rather than as a standalone module).
This patch adds a script that uses the generated modules.builtin.ranges
data to annotate the symbols in the System.map with module names if
their address falls within a range that belongs to one or more built-in
modules.
It then processes the vmlinux.map (and if needed, vmlinux.o.map) to
verify the annotation:
- For each top-level section:
- For each object in the section:
- Determine whether the object is part of a built-in module
(using modules.builtin and the .*.cmd file used to compile
the object as suggested in [0])
- For each symbol in that object, verify that the built-in
module association (or lack thereof) matches the annotation
given to the symbol.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
Changes since v7:
- Removed extra close(fn)
Changes since v6:
- Applied Masahiro Yamada's suggestions to the AWK script.
Changes since v5:
- Added optional 6th argument to specify kernel build directory.
- Report error and exit if .*.o.cmd files cannot be read.
Changes since v4:
- New patch in the series
---
scripts/verify_builtin_ranges.awk | 355 ++++++++++++++++++++++++++++++
1 file changed, 355 insertions(+)
create mode 100755 scripts/verify_builtin_ranges.awk
diff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk
new file mode 100755
index 000000000000..22717a4ab6c8
--- /dev/null
+++ b/scripts/verify_builtin_ranges.awk
@@ -0,0 +1,355 @@
+#!/usr/bin/gawk -f
+# SPDX-License-Identifier: GPL-2.0
+# verify_builtin_ranges.awk: Verify address range data for builtin modules
+# Written by Kris Van Hees <kris.van.hees@oracle.com>
+#
+# Usage: verify_builtin_ranges.awk modules.builtin.ranges System.map \
+# modules.builtin vmlinux.map vmlinux.o.map \
+# [ <build-dir> ]
+#
+
+# Return the module name(s) (if any) associated with the given object.
+#
+# If we have seen this object before, return information from the cache.
+# Otherwise, retrieve it from the corresponding .cmd file.
+#
+function get_module_info(fn, mod, obj, s) {
+ if (fn in omod)
+ return omod[fn];
+
+ if (match(fn, /\/[^/]+$/) == 0)
+ return "";
+
+ obj = fn;
+ mod = "";
+ fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
+ if (getline s <fn == 1) {
+ if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
+ mod = substr(s, RSTART + 16, RLENGTH - 16);
+ gsub(/['"]/, "", mod);
+ }
+ } else {
+ print "ERROR: Failed to read: " fn "\n\n" \
+ " Invalid kernel build directory (" kdir ")\n" \
+ " or its content does not match " ARGV[1] >"/dev/stderr";
+ close(fn);
+ total = 0;
+ exit(1);
+ }
+ close(fn);
+
+ # A single module (common case) also reflects objects that are not part
+ # of a module. Some of those objects have names that are also a module
+ # name (e.g. core). We check the associated module file name, and if
+ # they do not match, the object is not part of a module.
+ if (mod !~ / /) {
+ if (!(mod in mods))
+ mod = "";
+ }
+
+ gsub(/([^/ ]*\/)+/, "", mod);
+ gsub(/-/, "_", mod);
+
+ # At this point, mod is a single (valid) module name, or a list of
+ # module names (that do not need validation).
+ omod[obj] = mod;
+
+ return mod;
+}
+
+# Return a representative integer value for a given hexadecimal address.
+#
+# Since all kernel addresses fall within the same memory region, we can safely
+# strip off the first 6 hex digits before performing the hex-to-dec conversion,
+# thereby avoiding integer overflows.
+#
+function addr2val(val) {
+ sub(/^0x/, "", val);
+ if (length(val) == 16)
+ val = substr(val, 5);
+ return strtonum("0x" val);
+}
+
+# Determine the kernel build directory to use (default is .).
+#
+BEGIN {
+ if (ARGC > 6) {
+ kdir = ARGV[ARGC - 1];
+ ARGV[ARGC - 1] = "";
+ } else
+ kdir = ".";
+}
+
+# (1) Load the built-in module address range data.
+#
+ARGIND == 1 {
+ ranges[FNR] = $0;
+ rcnt++;
+ next;
+}
+
+# (2) Annotate System.map symbols with module names.
+#
+ARGIND == 2 {
+ addr = addr2val($1);
+ name = $3;
+
+ while (addr >= mod_eaddr) {
+ if (sect_symb) {
+ if (sect_symb != name)
+ next;
+
+ sect_base = addr - sect_off;
+ if (dbg)
+ printf "[%s] BASE (%s) %016x - %016x = %016x\n", sect_name, sect_symb, addr, sect_off, sect_base >"/dev/stderr";
+ sect_symb = 0;
+ }
+
+ if (++ridx > rcnt)
+ break;
+
+ $0 = ranges[ridx];
+ sub(/-/, " ");
+ if ($4 != "=") {
+ sub(/-/, " ");
+ mod_saddr = strtonum("0x" $2) + sect_base;
+ mod_eaddr = strtonum("0x" $3) + sect_base;
+ $1 = $2 = $3 = "";
+ sub(/^ +/, "");
+ mod_name = $0;
+
+ if (dbg)
+ printf "[%s] %s from %016x to %016x\n", sect_name, mod_name, mod_saddr, mod_eaddr >"/dev/stderr";
+ } else {
+ sect_name = $1;
+ sect_off = strtonum("0x" $2);
+ sect_symb = $5;
+ }
+ }
+
+ idx = addr"-"name;
+ if (addr >= mod_saddr && addr < mod_eaddr)
+ sym2mod[idx] = mod_name;
+
+ next;
+}
+
+# Once we are done annotating the System.map, we no longer need the ranges data.
+#
+FNR == 1 && ARGIND == 3 {
+ delete ranges;
+}
+
+# (3) Build a lookup map of built-in module names.
+#
+# Lines from modules.builtin will be like:
+# kernel/crypto/lzo-rle.ko
+# and we record the object name "crypto/lzo-rle".
+#
+ARGIND == 3 {
+ sub(/kernel\//, ""); # strip off "kernel/" prefix
+ sub(/\.ko$/, ""); # strip off .ko suffix
+
+ mods[$1] = 1;
+ next;
+}
+
+# (4) Get a list of symbols (per object).
+#
+# Symbols by object are read from vmlinux.map, with fallback to vmlinux.o.map
+# if vmlinux is found to have inked in vmlinux.o.
+#
+
+# If we were able to get the data we need from vmlinux.map, there is no need to
+# process vmlinux.o.map.
+#
+FNR == 1 && ARGIND == 5 && total > 0 {
+ if (dbg)
+ printf "Note: %s is not needed.\n", FILENAME >"/dev/stderr";
+ exit;
+}
+
+# First determine whether we are dealing with a GNU ld or LLVM lld linker map.
+#
+ARGIND >= 4 && FNR == 1 && NF == 7 && $1 == "VMA" && $7 == "Symbol" {
+ map_is_lld = 1;
+ next;
+}
+
+# (LLD) Convert a section record fronm lld format to ld format.
+#
+ARGIND >= 4 && map_is_lld && NF == 5 && /[0-9] [^ ]/ {
+ $0 = $5 " 0x"$1 " 0x"$3 " load address 0x"$2;
+}
+
+# (LLD) Convert an object record from lld format to ld format.
+#
+ARGIND >= 4 && map_is_lld && NF == 5 && $5 ~ /:\(\./ {
+ gsub(/\)/, "");
+ sub(/:\(/, " ");
+ sub(/ vmlinux\.a\(/, " ");
+ $0 = " "$6 " 0x"$1 " 0x"$3 " " $5;
+}
+
+# (LLD) Convert a symbol record from lld format to ld format.
+#
+ARGIND >= 4 && map_is_lld && NF == 5 && $5 ~ /^[A-Za-z_][A-Za-z0-9_]*$/ {
+ $0 = " 0x" $1 " " $5;
+}
+
+# (LLD) We do not need any other ldd linker map records.
+#
+ARGIND >= 4 && map_is_lld && /^[0-9a-f]{16} / {
+ next;
+}
+
+# Handle section records with long section names (spilling onto a 2nd line).
+#
+ARGIND >= 4 && !map_is_lld && NF == 1 && /^[^ ]/ {
+ s = $0;
+ getline;
+ $0 = s " " $0;
+}
+
+# Next section - previous one is done.
+#
+ARGIND >= 4 && /^[^ ]/ {
+ sect = 0;
+}
+
+# Get the (top level) section name.
+#
+ARGIND >= 4 && /^[^ ]/ && $2 ~ /^0x/ && $3 ~ /^0x/ {
+ # Empty section or per-CPU section - ignore.
+ if (NF < 3 || $1 ~ /\.percpu/) {
+ sect = 0;
+ next;
+ }
+
+ sect = $1;
+
+ next;
+}
+
+# If we are not currently in a section we care about, ignore records.
+#
+!sect {
+ next;
+}
+
+# Handle object records with long section names (spilling onto a 2nd line).
+#
+ARGIND >= 4 && /^ [^ \*]/ && NF == 1 {
+ # If the section name is long, the remainder of the entry is found on
+ # the next line.
+ s = $0;
+ getline;
+ $0 = s " " $0;
+}
+
+# If the object is vmlinux.o, we need to consult vmlinux.o.map for per-object
+# symbol information
+#
+ARGIND == 4 && /^ [^ ]/ && NF == 4 {
+ idx = sect":"$1;
+ if (!(idx in sect_addend)) {
+ sect_addend[idx] = addr2val($2);
+ if (dbg)
+ printf "ADDEND %s = %016x\n", idx, sect_addend[idx] >"/dev/stderr";
+ }
+ if ($4 == "vmlinux.o") {
+ need_o_map = 1;
+ next;
+ }
+}
+
+# If data from vmlinux.o.map is needed, we only process section and object
+# records from vmlinux.map to determine which section we need to pay attention
+# to in vmlinux.o.map. So skip everything else from vmlinux.map.
+#
+ARGIND == 4 && need_o_map {
+ next;
+}
+
+# Get module information for the current object.
+#
+ARGIND >= 4 && /^ [^ ]/ && NF == 4 {
+ msect = $1;
+ mod_name = get_module_info($4);
+ mod_eaddr = addr2val($2) + addr2val($3);
+
+ next;
+}
+
+# Process a symbol record.
+#
+# Evaluate the module information obtained from vmlinux.map (or vmlinux.o.map)
+# as follows:
+# - For all symbols in a given object:
+# - If the symbol is annotated with the same module name(s) that the object
+# belongs to, count it as a match.
+# - Otherwise:
+# - If the symbol is known to have duplicates of which at least one is
+# in a built-in module, disregard it.
+# - If the symbol us not annotated with any module name(s) AND the
+# object belongs to built-in modules, count it as missing.
+# - Otherwise, count it as a mismatch.
+#
+ARGIND >= 4 && /^ / && NF == 2 && $1 ~ /^0x/ {
+ idx = sect":"msect;
+ if (!(idx in sect_addend))
+ next;
+
+ addr = addr2val($1);
+
+ # Handle the rare but annoying case where a 0-size symbol is placed at
+ # the byte *after* the module range. Based on vmlinux.map it will be
+ # considered part of the current object, but it falls just beyond the
+ # module address range. Unfortunately, its address could be at the
+ # start of another built-in module, so the only safe thing to do is to
+ # ignore it.
+ if (mod_name && addr == mod_eaddr)
+ next;
+
+ # If we are processing vmlinux.o.map, we need to apply the base address
+ # of the section to the relative address on the record.
+ #
+ if (ARGIND == 5)
+ addr += sect_addend[idx];
+
+ idx = addr"-"$2;
+ mod = "";
+ if (idx in sym2mod) {
+ mod = sym2mod[idx];
+ if (sym2mod[idx] == mod_name) {
+ mod_matches++;
+ matches++;
+ } else if (mod_name == "") {
+ print $2 " in " sym2mod[idx] " (should NOT be)";
+ mismatches++;
+ } else {
+ print $2 " in " sym2mod[idx] " (should be " mod_name ")";
+ mismatches++;
+ }
+ } else if (mod_name != "") {
+ print $2 " should be in " mod_name;
+ missing++;
+ } else
+ matches++;
+
+ total++;
+
+ next;
+}
+
+# Issue the comparison report.
+#
+END {
+ if (total) {
+ printf "Verification of %s:\n", ARGV[1];
+ printf " Correct matches: %6d (%d%% of total)\n", matches, 100 * matches / total;
+ printf " Module matches: %6d (%d%% of matches)\n", mod_matches, 100 * mod_matches / matches;
+ printf " Mismatches: %6d (%d%% of total)\n", mismatches, 100 * mismatches / total;
+ printf " Missing: %6d (%d%% of total)\n", missing, 100 * missing / total;
+ }
+}
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 4/4] module: add install target for modules.builtin.ranges
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
` (2 preceding siblings ...)
2024-08-22 18:19 ` [PATCH v8 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
@ 2024-08-22 18:19 ` Kris Van Hees
3 siblings, 0 replies; 17+ messages in thread
From: Kris Van Hees @ 2024-08-22 18:19 UTC (permalink / raw)
To: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel
Cc: Kris Van Hees, Nick Alcock, Masahiro Yamada, Steven Rostedt,
Luis Chamberlain, Masami Hiramatsu, Nick Desaulniers, Jiri Olsa,
Elena Zannoni
When CONFIG_BUILTIN_MODULE_RANGES is enabled, the modules.builtin.ranges
file should be installed in the module install location.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
---
Changes since v3:
- Only install modules.builtin.ranges if CONFIG_BUILTIN_MODULE_RANGES=y
---
scripts/Makefile.modinst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.modinst b/scripts/Makefile.modinst
index 0afd75472679..c38bf63a33be 100644
--- a/scripts/Makefile.modinst
+++ b/scripts/Makefile.modinst
@@ -30,10 +30,12 @@ $(MODLIB)/modules.order: modules.order FORCE
quiet_cmd_install_modorder = INSTALL $@
cmd_install_modorder = sed 's:^\(.*\)\.o$$:kernel/\1.ko:' $< > $@
-# Install modules.builtin(.modinfo) even when CONFIG_MODULES is disabled.
+# Install modules.builtin(.modinfo,.ranges) even when CONFIG_MODULES is disabled.
install-y += $(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo)
-$(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo): $(MODLIB)/%: % FORCE
+install-$(CONFIG_BUILTIN_MODULE_RANGES) += $(MODLIB)/modules.builtin.ranges
+
+$(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo modules.builtin.ranges): $(MODLIB)/%: % FORCE
$(call cmd,install)
endif
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v8 2/4] kbuild: generate offset range data for builtin modules
2024-08-22 18:19 ` [PATCH v8 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
@ 2024-08-23 16:53 ` Sami Tolvanen
2024-08-23 17:06 ` Kris Van Hees
0 siblings, 1 reply; 17+ messages in thread
From: Sami Tolvanen @ 2024-08-23 16:53 UTC (permalink / raw)
To: Kris Van Hees
Cc: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel,
Nick Alcock, Alan Maguire, Steven Rostedt, Masahiro Yamada,
Luis Chamberlain, Masami Hiramatsu, Nick Desaulniers, Jiri Olsa,
Elena Zannoni
Hi Kris,
On Thu, Aug 22, 2024 at 02:19:39PM -0400, Kris Van Hees wrote:
> diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
> new file mode 100755
> index 000000000000..68df05fd3036
> --- /dev/null
> +++ b/scripts/generate_builtin_ranges.awk
> @@ -0,0 +1,505 @@
> +#!/usr/bin/gawk -f
> +# SPDX-License-Identifier: GPL-2.0
> +# generate_builtin_ranges.awk: Generate address range data for builtin modules
> +# Written by Kris Van Hees <kris.van.hees@oracle.com>
> +#
> +# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
> +# vmlinux.o.map > modules.builtin.ranges
> +#
> +
> +# Return the module name(s) (if any) associated with the given object.
> +#
> +# If we have seen this object before, return information from the cache.
> +# Otherwise, retrieve it from the corresponding .cmd file.
> +#
> +function get_module_info(fn, mod, obj, s) {
> + if (fn in omod)
> + return omod[fn];
> +
> + if (match(fn, /\/[^/]+$/) == 0)
> + return "";
> +
> + obj = fn;
> + mod = "";
> + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> + if (getline s <fn == 1) {
> + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> + mod = substr(s, RSTART + 16, RLENGTH - 16);
> + gsub(/['"]/, "", mod);
> + }
> + }
This doesn't work with built-in Rust modules because there's no
-DKBUILD_MODFILE flag passed to the compiler. The .cmd files do have
RUST_MODFILE set though, so presumably you could match that too?
Sami
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 2/4] kbuild: generate offset range data for builtin modules
2024-08-23 16:53 ` Sami Tolvanen
@ 2024-08-23 17:06 ` Kris Van Hees
2024-08-23 17:23 ` Sami Tolvanen
0 siblings, 1 reply; 17+ messages in thread
From: Kris Van Hees @ 2024-08-23 17:06 UTC (permalink / raw)
To: Sami Tolvanen
Cc: Kris Van Hees, linux-kernel, linux-kbuild, linux-modules,
linux-trace-kernel, Nick Alcock, Alan Maguire, Steven Rostedt,
Masahiro Yamada, Luis Chamberlain, Masami Hiramatsu,
Nick Desaulniers, Jiri Olsa, Elena Zannoni
On Fri, Aug 23, 2024 at 04:53:29PM +0000, Sami Tolvanen wrote:
> Hi Kris,
>
> On Thu, Aug 22, 2024 at 02:19:39PM -0400, Kris Van Hees wrote:
> > diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
> > new file mode 100755
> > index 000000000000..68df05fd3036
> > --- /dev/null
> > +++ b/scripts/generate_builtin_ranges.awk
> > @@ -0,0 +1,505 @@
> > +#!/usr/bin/gawk -f
> > +# SPDX-License-Identifier: GPL-2.0
> > +# generate_builtin_ranges.awk: Generate address range data for builtin modules
> > +# Written by Kris Van Hees <kris.van.hees@oracle.com>
> > +#
> > +# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
> > +# vmlinux.o.map > modules.builtin.ranges
> > +#
> > +
> > +# Return the module name(s) (if any) associated with the given object.
> > +#
> > +# If we have seen this object before, return information from the cache.
> > +# Otherwise, retrieve it from the corresponding .cmd file.
> > +#
> > +function get_module_info(fn, mod, obj, s) {
> > + if (fn in omod)
> > + return omod[fn];
> > +
> > + if (match(fn, /\/[^/]+$/) == 0)
> > + return "";
> > +
> > + obj = fn;
> > + mod = "";
> > + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> > + if (getline s <fn == 1) {
> > + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> > + mod = substr(s, RSTART + 16, RLENGTH - 16);
> > + gsub(/['"]/, "", mod);
> > + }
> > + }
>
> This doesn't work with built-in Rust modules because there's no
> -DKBUILD_MODFILE flag passed to the compiler. The .cmd files do have
> RUST_MODFILE set though, so presumably you could match that too?
Thanks for looking at the patch series. I'll look into this.
Is there a reason why Rust modules are using RUST_MODFILE rather than also
using KBUILD_MODFILE as the macro to pass information about what module(s)
the object belongs to?
Thanksm,
Kris
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 2/4] kbuild: generate offset range data for builtin modules
2024-08-23 17:06 ` Kris Van Hees
@ 2024-08-23 17:23 ` Sami Tolvanen
2024-08-24 16:44 ` Miguel Ojeda
0 siblings, 1 reply; 17+ messages in thread
From: Sami Tolvanen @ 2024-08-23 17:23 UTC (permalink / raw)
To: Kris Van Hees, Miguel Ojeda
Cc: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel,
Nick Alcock, Alan Maguire, Steven Rostedt, Masahiro Yamada,
Luis Chamberlain, Masami Hiramatsu, Nick Desaulniers, Jiri Olsa,
Elena Zannoni
On Fri, Aug 23, 2024 at 10:06 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Fri, Aug 23, 2024 at 04:53:29PM +0000, Sami Tolvanen wrote:
> > Hi Kris,
> >
> > On Thu, Aug 22, 2024 at 02:19:39PM -0400, Kris Van Hees wrote:
> > > diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
> > > new file mode 100755
> > > index 000000000000..68df05fd3036
> > > --- /dev/null
> > > +++ b/scripts/generate_builtin_ranges.awk
> > > @@ -0,0 +1,505 @@
> > > +#!/usr/bin/gawk -f
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# generate_builtin_ranges.awk: Generate address range data for builtin modules
> > > +# Written by Kris Van Hees <kris.van.hees@oracle.com>
> > > +#
> > > +# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
> > > +# vmlinux.o.map > modules.builtin.ranges
> > > +#
> > > +
> > > +# Return the module name(s) (if any) associated with the given object.
> > > +#
> > > +# If we have seen this object before, return information from the cache.
> > > +# Otherwise, retrieve it from the corresponding .cmd file.
> > > +#
> > > +function get_module_info(fn, mod, obj, s) {
> > > + if (fn in omod)
> > > + return omod[fn];
> > > +
> > > + if (match(fn, /\/[^/]+$/) == 0)
> > > + return "";
> > > +
> > > + obj = fn;
> > > + mod = "";
> > > + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> > > + if (getline s <fn == 1) {
> > > + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> > > + mod = substr(s, RSTART + 16, RLENGTH - 16);
> > > + gsub(/['"]/, "", mod);
> > > + }
> > > + }
> >
> > This doesn't work with built-in Rust modules because there's no
> > -DKBUILD_MODFILE flag passed to the compiler. The .cmd files do have
> > RUST_MODFILE set though, so presumably you could match that too?
>
> Thanks for looking at the patch series. I'll look into this.
>
> Is there a reason why Rust modules are using RUST_MODFILE rather than also
> using KBUILD_MODFILE as the macro to pass information about what module(s)
> the object belongs to?
I assume they wanted to avoid conflicts between Rust-specific
environment variables and existing Kbuild variables. Note that
KBUILD_MODFILE is also double quoted for the C preprocessor, which
isn't needed for Rust. Miguel, do you remember if there's another
reason for the different variable name?
Sami
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects
2024-08-22 18:19 ` [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
@ 2024-08-23 17:37 ` Masahiro Yamada
0 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2024-08-23 17:37 UTC (permalink / raw)
To: Kris Van Hees
Cc: linux-kernel, linux-kbuild, linux-modules, linux-trace-kernel,
Steven Rostedt, Luis Chamberlain, Masami Hiramatsu,
Nick Desaulniers, Jiri Olsa, Elena Zannoni
On Fri, Aug 23, 2024 at 3:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> In order to create the file at build time, modules.builtin.ranges, that
> contains the range of addresses for all built-in modules, there needs to
> be a way to identify what code is compiled into modules.
>
> To identify what code is compiled into modules during a kernel build,
> one can look for the presence of the -DKBUILD_MODFILE and -DKBUILD_MODNAME
> options in the compile command lines. A simple grep in .*.cmd files for
> those options is sufficient for this.
>
> Unfortunately, these options are only passed when compiling C source files.
> Various modules also include objects built from assembler source, and these
> options are not passed in that case.
>
> Adding $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and
modkern_cflahs -> modkern_cflags
> adding $(modname_flags) to a_flags (similar to c_flags) makes it possible
> to identify which objects are compiled into modules for both C and
> assembler soure files.
soure -> source
Strictly speaking, only KBUILD_MODFILE was used in 2/4 or 3/4.
KBUILD_MODNAME was unneeded, but If you want to add KBUILD_MODNAME
for consistency, it is fine too.
RUST_MODFILE exists, but RUST_MODNAME does not.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> scripts/Makefile.lib | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index fe3668dc4954..170f462537a8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -238,7 +238,7 @@ modkern_rustflags = \
>
> modkern_aflags = $(if $(part-of-module), \
> $(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE), \
> - $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL))
> + $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL) $(modfile_flags))
>
> c_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
> -include $(srctree)/include/linux/compiler_types.h \
> @@ -248,7 +248,7 @@ c_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
> rust_flags = $(_rust_flags) $(modkern_rustflags) @$(objtree)/include/generated/rustc_cfg
>
> a_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
> - $(_a_flags) $(modkern_aflags)
> + $(_a_flags) $(modkern_aflags) $(modname_flags)
>
> cpp_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
> $(_cpp_flags)
> --
> 2.45.2
>
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 2/4] kbuild: generate offset range data for builtin modules
2024-08-23 17:23 ` Sami Tolvanen
@ 2024-08-24 16:44 ` Miguel Ojeda
0 siblings, 0 replies; 17+ messages in thread
From: Miguel Ojeda @ 2024-08-24 16:44 UTC (permalink / raw)
To: Sami Tolvanen
Cc: Kris Van Hees, Miguel Ojeda, linux-kernel, linux-kbuild,
linux-modules, linux-trace-kernel, Nick Alcock, Alan Maguire,
Steven Rostedt, Masahiro Yamada, Luis Chamberlain,
Masami Hiramatsu, Nick Desaulniers, Jiri Olsa, Elena Zannoni
On Fri, Aug 23, 2024 at 7:24 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> I assume they wanted to avoid conflicts between Rust-specific
> environment variables and existing Kbuild variables. Note that
> KBUILD_MODFILE is also double quoted for the C preprocessor, which
> isn't needed for Rust. Miguel, do you remember if there's another
> reason for the different variable name?
No, I don't recall another reason -- I think you are right, they did
not carry (exactly) the same contents, and thus the different name.
So I think it can be merged/changed into something else if needed.
Cheers,
Miguel
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-08-24 16:44 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 4:06 [PATCH v7 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
2024-08-21 4:06 ` [PATCH v7 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
2024-08-22 17:34 ` Masahiro Yamada
2024-08-21 4:06 ` [PATCH v7 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
2024-08-22 17:35 ` Masahiro Yamada
2024-08-21 4:07 ` [PATCH v7 4/4] module: add install target for modules.builtin.ranges Kris Van Hees
2024-08-21 14:40 ` [PATCH v7 0/4] Generate address range data for built-in modules Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 " Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 1/4] kbuild: add mod(name,file)_flags to assembler flags for module objects Kris Van Hees
2024-08-23 17:37 ` Masahiro Yamada
2024-08-22 18:19 ` [PATCH v8 2/4] kbuild: generate offset range data for builtin modules Kris Van Hees
2024-08-23 16:53 ` Sami Tolvanen
2024-08-23 17:06 ` Kris Van Hees
2024-08-23 17:23 ` Sami Tolvanen
2024-08-24 16:44 ` Miguel Ojeda
2024-08-22 18:19 ` [PATCH v8 3/4] scripts: add verifier script for builtin module range data Kris Van Hees
2024-08-22 18:19 ` [PATCH v8 4/4] module: add install target for modules.builtin.ranges Kris Van Hees
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).