From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76461212FAD; Mon, 4 May 2026 15:34:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777908888; cv=none; b=Cf7S+LkGgbHIDnW1MVic2FhajNNb0G11UCyIAcvXtI8rERU+aikVDaySxz1Rf+CBCokhfr1BcIO9Amxiv0//9dBjqkCebPjKFHx3OSgTPxr/E4kFglGiZNUS1e2wI5nvT4LPBFrCFpA9ouXLivimXvfoMn7z/PnN6TxicIgZT+Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777908888; c=relaxed/simple; bh=RLAva6tIJGLw+qmrleQUCEm9e77PvFT0fjxvBSb0AiU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=itJiHP2mUbaE7gvdODJFdTmzuEwg4rfCFEqEk5dOkgYaDVGFLZZx2OTPjb10eJeHq9wpPJgGh8do5WWCGrEqBWZMAxTzEkumklFbBfZLgZOYdjrfWHJJnjq/KT+R7niwAEcatlOhGVHFA6DGwVD6PZDsa3lH4CPWhAGusugfkH0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=u8WSBisz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="u8WSBisz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C5860C2BCB8; Mon, 4 May 2026 15:34:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777908888; bh=RLAva6tIJGLw+qmrleQUCEm9e77PvFT0fjxvBSb0AiU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u8WSBisz/c1p0hcm/NVToHijD7+VbjJaRLAgc0v/YB/jrvVGcXvSaGVb00LUpkepW cHmYKcWS3VokiCf/VQLxllhQsMjgDAOq2BGeth+XzcYLON2cX0pcao/J4jdRdQ8e0I hawKMeEsl9fv1MBNJf6x2Cylf/J/ASyu1838MK76R45CNWpbM39l10mLWE5qBw2Lso NNcT8d6tzGlwk9P675OE06Fb4sxNLsuBjpCXlRDxvulY/6lttQtXvcpDzoxoEJI6DX 1S3hyRiVBAyHK+TqC/tgatI4ZXWvdWLtC1MdPUU4yMvER4O3TVc5yXiXEjS/jwwSX9 4GbZsf9je16Dw== From: Sasha Levin To: Andrew Morton , Masahiro Yamada , Luis Chamberlain , Linus Torvalds , Richard Weinberger , Juergen Gross , Geert Uytterhoeven , James Bottomley Cc: Jonathan Corbet , Nathan Chancellor , Nicolas Schier , Petr Pavlu , Daniel Gomez , Greg KH , Petr Mladek , Steven Rostedt , Kees Cook , Peter Zijlstra , Thorsten Leemhuis , Vlastimil Babka , Helge Deller , Randy Dunlap , Laurent Pinchart , Vivian Wang , Zhen Lei , Sami Tolvanen , linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-modules@vger.kernel.org, linux-doc@vger.kernel.org, Sasha Levin Subject: [PATCH v5 1/4] kallsyms: embed source file:line info in kernel stack traces Date: Mon, 4 May 2026 11:33:57 -0400 Message-ID: <20260504153401.2416391-2-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260504153401.2416391-1-sashal@kernel.org> References: <20260504153401.2416391-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add CONFIG_KALLSYMS_LINEINFO, which embeds a compact address-to-line lookup table in the kernel image so stack traces directly print source file and line number information: root@localhost:~# echo c > /proc/sysrq-trigger [ 11.201987] sysrq: Trigger a crash [ 11.202831] Kernel panic - not syncing: sysrq triggered crash [ 11.206218] Call Trace: [ 11.206501] [ 11.206749] dump_stack_lvl+0x5d/0x80 (lib/dump_stack.c:94) [ 11.207403] vpanic+0x36e/0x620 (kernel/panic.c:650) [ 11.208565] ? __lock_acquire+0x465/0x2240 (kernel/locking/lockdep.c:4674) [ 11.209324] panic+0xc9/0xd0 (kernel/panic.c:787) [ 11.211873] ? find_held_lock+0x2b/0x80 (kernel/locking/lockdep.c:5350) [ 11.212597] ? lock_release+0xd3/0x300 (kernel/locking/lockdep.c:5535) [ 11.213312] sysrq_handle_crash+0x1a/0x20 (drivers/tty/sysrq.c:154) [ 11.214005] __handle_sysrq.cold+0x66/0x256 (drivers/tty/sysrq.c:611) [ 11.214712] write_sysrq_trigger+0x65/0x80 (drivers/tty/sysrq.c:1221) [ 11.215424] proc_reg_write+0x1bd/0x3c0 (fs/proc/inode.c:330) [ 11.216061] vfs_write+0x1c6/0xff0 (fs/read_write.c:686) [ 11.218848] ksys_write+0xfa/0x200 (fs/read_write.c:740) [ 11.222394] do_syscall_64+0xf3/0x690 (arch/x86/entry/syscall_64.c:63) [ 11.223942] entry_SYSCALL_64_after_hwframe+0x77/0x7f (arch/x86/entry/entry_64.S:121) At build time, a new host tool (scripts/gen_lineinfo) reads DWARF .debug_line from vmlinux using libdw (elfutils), extracts all address-to-file:line mappings, and generates an assembly file with sorted parallel arrays (offsets from _text, file IDs, and line numbers). These are linked into vmlinux as .rodata. At runtime, kallsyms_lookup_lineinfo() does a binary search on the table and __sprint_symbol() appends "(file:line)" to each stack frame. The lookup uses offsets from _text so it works with KASLR, requires no locks or allocations, and is safe in any context including panic. The feature requires CONFIG_DEBUG_INFO (for DWARF data) and elfutils (libdw-dev) on the build host. Memory footprint measured with a 1852-option x86_64 config: Table: 4,597,583 entries from 4,841 source files lineinfo_addrs[] 4,597,583 x u32 = 17.5 MiB lineinfo_file_ids[] 4,597,583 x u16 = 8.8 MiB lineinfo_lines[] 4,597,583 x u32 = 17.5 MiB file_offsets + filenames ~ 0.1 MiB Total .rodata increase: ~ 44.0 MiB vmlinux (stripped): 529 MiB -> 573 MiB (+44 MiB / +8.3%) The .config used for testing is a simple KVM guest configuration for local development and testing. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sasha Levin --- Documentation/admin-guide/index.rst | 1 + .../admin-guide/kallsyms-lineinfo.rst | 72 +++ MAINTAINERS | 6 + include/linux/kallsyms.h | 17 +- init/Kconfig | 20 + kernel/kallsyms.c | 76 ++- kernel/kallsyms_internal.h | 9 + scripts/.gitignore | 1 + scripts/Makefile | 3 + scripts/empty_lineinfo.S | 30 + scripts/gen_lineinfo.c | 557 ++++++++++++++++++ scripts/kallsyms.c | 16 + scripts/link-vmlinux.sh | 43 +- 13 files changed, 841 insertions(+), 10 deletions(-) create mode 100644 Documentation/admin-guide/kallsyms-lineinfo.rst create mode 100644 scripts/empty_lineinfo.S create mode 100644 scripts/gen_lineinfo.c diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index cd28dfe91b060..37456e08fe43c 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -73,6 +73,7 @@ problems and bugs in particular. ramoops dynamic-debug-howto init + kallsyms-lineinfo kdump/index perf/index pstore-blk diff --git a/Documentation/admin-guide/kallsyms-lineinfo.rst b/Documentation/admin-guide/kallsyms-lineinfo.rst new file mode 100644 index 0000000000000..c8ec124394354 --- /dev/null +++ b/Documentation/admin-guide/kallsyms-lineinfo.rst @@ -0,0 +1,72 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================================== +Kallsyms Source Line Info (LINEINFO) +==================================== + +Overview +======== + +``CONFIG_KALLSYMS_LINEINFO`` embeds DWARF-derived source file and line number +mappings into the kernel image so that stack traces include +``(file.c:123)`` annotations next to each symbol. This makes it significantly +easier to pinpoint the exact source location during debugging, without needing +to manually cross-reference addresses with ``addr2line``. + +Enabling the Feature +==================== + +Enable the following kernel configuration options:: + + CONFIG_KALLSYMS=y + CONFIG_DEBUG_INFO=y + CONFIG_KALLSYMS_LINEINFO=y + +Build dependency: the host tool ``scripts/gen_lineinfo`` requires ``libdw`` +from elfutils. Install the development package: + +- Debian/Ubuntu: ``apt install libdw-dev`` +- Fedora/RHEL: ``dnf install elfutils-devel`` +- Arch Linux: ``pacman -S elfutils`` + +Example Output +============== + +Without ``CONFIG_KALLSYMS_LINEINFO``:: + + Call Trace: + + dump_stack_lvl+0x5d/0x80 + do_syscall_64+0x82/0x190 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + +With ``CONFIG_KALLSYMS_LINEINFO``:: + + Call Trace: + + dump_stack_lvl+0x5d/0x80 (lib/dump_stack.c:123) + do_syscall_64+0x82/0x190 (arch/x86/entry/common.c:52) + entry_SYSCALL_64_after_hwframe+0x76/0x7e + +Note that assembly routines (such as ``entry_SYSCALL_64_after_hwframe``) are +not annotated because they lack DWARF debug information. + +Memory Overhead +=============== + +The lineinfo tables are stored in ``.rodata`` and typically add approximately +44 MiB to the kernel image for a standard configuration (~4.6 million DWARF +line entries, ~10 bytes per entry after deduplication). + +Known Limitations +================= + +- **vmlinux only**: Only symbols in the core kernel image are annotated. + Module symbols are not covered. +- **4 GiB offset limit**: Address offsets from ``_text`` are stored as 32-bit + values. Entries beyond 4 GiB from ``_text`` are skipped at build time with + a warning. +- **65535 file limit**: Source file IDs are stored as 16-bit values. Builds + with more than 65535 unique source files will fail with an error. +- **No assembly annotations**: Functions implemented in assembly that lack + DWARF ``.debug_line`` data are not annotated. diff --git a/MAINTAINERS b/MAINTAINERS index 2fb1c75afd163..a683a7023f6e9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13793,6 +13793,12 @@ S: Maintained F: Documentation/hwmon/k8temp.rst F: drivers/hwmon/k8temp.c +KALLSYMS LINEINFO +M: Sasha Levin +S: Maintained +F: Documentation/admin-guide/kallsyms-lineinfo.rst +F: scripts/gen_lineinfo.c + KASAN M: Andrey Ryabinin R: Alexander Potapenko diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index d5dd54c53ace6..7d4c9dca06c87 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -16,10 +16,15 @@ #include #define KSYM_NAME_LEN 512 + +/* Extra space for " (path/to/file.c:12345)" suffix when lineinfo is enabled */ +#define KSYM_LINEINFO_LEN (IS_ENABLED(CONFIG_KALLSYMS_LINEINFO) ? 128 : 0) + #define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s %s]") + \ (KSYM_NAME_LEN - 1) + \ 2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + \ - (BUILD_ID_SIZE_MAX * 2) + 1) + (BUILD_ID_SIZE_MAX * 2) + 1 + \ + KSYM_LINEINFO_LEN) struct cred; struct module; @@ -96,6 +101,9 @@ extern int sprint_backtrace_build_id(char *buffer, unsigned long address); int lookup_symbol_name(unsigned long addr, char *symname); +bool kallsyms_lookup_lineinfo(unsigned long addr, + const char **file, unsigned int *line); + #else /* !CONFIG_KALLSYMS */ static inline unsigned long kallsyms_lookup_name(const char *name) @@ -164,6 +172,13 @@ static inline int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long) { return -EOPNOTSUPP; } + +static inline bool kallsyms_lookup_lineinfo(unsigned long addr, + const char **file, + unsigned int *line) +{ + return false; +} #endif /*CONFIG_KALLSYMS*/ static inline void print_ip_sym(const char *loglvl, unsigned long ip) diff --git a/init/Kconfig b/init/Kconfig index 2937c4d308aec..99e78c253056a 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2070,6 +2070,26 @@ config KALLSYMS_ALL Say N unless you really need all symbols, or kernel live patching. +config KALLSYMS_LINEINFO + bool "Embed source file:line information in stack traces" + depends on KALLSYMS && DEBUG_INFO + help + Embeds an address-to-source-line mapping table in the kernel + image so that stack traces directly include file:line information, + similar to what scripts/decode_stacktrace.sh provides but without + needing external tools or a vmlinux with debug info at runtime. + + When enabled, stack traces will look like: + + kmem_cache_alloc_noprof+0x60/0x630 (mm/slub.c:3456) + anon_vma_clone+0x2ed/0xcf0 (mm/rmap.c:412) + + This requires elfutils (libdw-dev/elfutils-devel) on the build host. + Adds approximately 44MB to a typical kernel image (10 bytes per + DWARF line-table entry, ~4.6M entries for a typical config). + + If unsure, say N. + # end of the "standard kernel features (expert users)" menu config ARCH_HAS_MEMBARRIER_CALLBACKS diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index aec2f06858afd..1e3f527b13988 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -467,9 +467,56 @@ static int append_buildid(char *buffer, const char *modname, #endif /* CONFIG_STACKTRACE_BUILD_ID */ +bool kallsyms_lookup_lineinfo(unsigned long addr, + const char **file, unsigned int *line) +{ + unsigned long long raw_offset; + unsigned int offset, low, high, mid, file_id; + + if (!IS_ENABLED(CONFIG_KALLSYMS_LINEINFO) || !lineinfo_num_entries) + return false; + + /* Compute offset from _text */ + if (addr < (unsigned long)_text) + return false; + + raw_offset = addr - (unsigned long)_text; + if (raw_offset > UINT_MAX) + return false; + offset = (unsigned int)raw_offset; + + /* Binary search for largest entry <= offset */ + low = 0; + high = lineinfo_num_entries; + while (low < high) { + mid = low + (high - low) / 2; + if (lineinfo_addrs[mid] <= offset) + low = mid + 1; + else + high = mid; + } + + if (low == 0) + return false; + low--; + + file_id = lineinfo_file_ids[low]; + *line = lineinfo_lines[low]; + + if (file_id >= lineinfo_num_files) + return false; + + if (lineinfo_file_offsets[file_id] >= lineinfo_filenames_size) + return false; + + *file = &lineinfo_filenames[lineinfo_file_offsets[file_id]]; + return true; +} + /* Look up a kernel symbol and return it in a text buffer. */ static int __sprint_symbol(char *buffer, unsigned long address, - int symbol_offset, int add_offset, int add_buildid) + int symbol_offset, int add_offset, int add_buildid, + int add_lineinfo) { char *modname; const unsigned char *buildid; @@ -497,6 +544,23 @@ static int __sprint_symbol(char *buffer, unsigned long address, len += sprintf(buffer + len, "]"); } + /* + * Append "(file:line)" only for stack-backtrace consumers. Plain + * sprint_symbol() backs %ps, and many existing format strings tack + * literal "()" after %ps to indicate a function call ("foo() + * replaced with bar()"); appending lineinfo there would produce a + * confusing "foo (file:line)()". + */ + if (add_lineinfo && IS_ENABLED(CONFIG_KALLSYMS_LINEINFO) && !modname) { + const char *li_file; + unsigned int li_line; + + if (kallsyms_lookup_lineinfo(address, + &li_file, &li_line)) + len += snprintf(buffer + len, KSYM_SYMBOL_LEN - len, + " (%s:%u)", li_file, li_line); + } + return len; } @@ -513,7 +577,7 @@ static int __sprint_symbol(char *buffer, unsigned long address, */ int sprint_symbol(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, 0, 1, 0); + return __sprint_symbol(buffer, address, 0, 1, 0, 0); } EXPORT_SYMBOL_GPL(sprint_symbol); @@ -530,7 +594,7 @@ EXPORT_SYMBOL_GPL(sprint_symbol); */ int sprint_symbol_build_id(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, 0, 1, 1); + return __sprint_symbol(buffer, address, 0, 1, 1, 0); } EXPORT_SYMBOL_GPL(sprint_symbol_build_id); @@ -547,7 +611,7 @@ EXPORT_SYMBOL_GPL(sprint_symbol_build_id); */ int sprint_symbol_no_offset(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, 0, 0, 0); + return __sprint_symbol(buffer, address, 0, 0, 0, 0); } EXPORT_SYMBOL_GPL(sprint_symbol_no_offset); @@ -567,7 +631,7 @@ EXPORT_SYMBOL_GPL(sprint_symbol_no_offset); */ int sprint_backtrace(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, -1, 1, 0); + return __sprint_symbol(buffer, address, -1, 1, 0, 1); } /** @@ -587,7 +651,7 @@ int sprint_backtrace(char *buffer, unsigned long address) */ int sprint_backtrace_build_id(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, -1, 1, 1); + return __sprint_symbol(buffer, address, -1, 1, 1, 1); } /* To avoid using get_symbol_offset for every symbol, we carry prefix along. */ diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h index 81a867dbe57d4..d7374ce444d81 100644 --- a/kernel/kallsyms_internal.h +++ b/kernel/kallsyms_internal.h @@ -15,4 +15,13 @@ extern const u16 kallsyms_token_index[]; extern const unsigned int kallsyms_markers[]; extern const u8 kallsyms_seqs_of_names[]; +extern const u32 lineinfo_num_entries; +extern const u32 lineinfo_addrs[]; +extern const u16 lineinfo_file_ids[]; +extern const u32 lineinfo_lines[]; +extern const u32 lineinfo_num_files; +extern const u32 lineinfo_file_offsets[]; +extern const u32 lineinfo_filenames_size; +extern const char lineinfo_filenames[]; + #endif // LINUX_KALLSYMS_INTERNAL_H_ diff --git a/scripts/.gitignore b/scripts/.gitignore index 4215c2208f7e4..e175714c18b61 100644 --- a/scripts/.gitignore +++ b/scripts/.gitignore @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only /asn1_compiler +/gen_lineinfo /gen_packed_field_checks /generate_rust_target /insert-sys-cert diff --git a/scripts/Makefile b/scripts/Makefile index 3434a82a119f0..55244ce955781 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -4,6 +4,7 @@ # the kernel for the build process. hostprogs-always-$(CONFIG_KALLSYMS) += kallsyms +hostprogs-always-$(CONFIG_KALLSYMS_LINEINFO) += gen_lineinfo hostprogs-always-$(BUILD_C_RECORDMCOUNT) += recordmcount hostprogs-always-$(CONFIG_BUILDTIME_TABLE_SORT) += sorttable hostprogs-always-$(CONFIG_ASN1) += asn1_compiler @@ -37,6 +38,8 @@ HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include HOSTCFLAGS_sign-file.o = $(shell $(HOSTPKG_CONFIG) --cflags libcrypto 2> /dev/null) HOSTCFLAGS_sign-file.o += -I$(srctree)/tools/include/uapi/ HOSTLDLIBS_sign-file = $(shell $(HOSTPKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto) +HOSTCFLAGS_gen_lineinfo.o = $(shell $(HOSTPKG_CONFIG) --cflags libdw 2> /dev/null) +HOSTLDLIBS_gen_lineinfo = $(shell $(HOSTPKG_CONFIG) --libs libdw 2> /dev/null || echo -ldw -lelf -lz) ifdef CONFIG_UNWINDER_ORC ifeq ($(ARCH),x86_64) diff --git a/scripts/empty_lineinfo.S b/scripts/empty_lineinfo.S new file mode 100644 index 0000000000000..e058c41137123 --- /dev/null +++ b/scripts/empty_lineinfo.S @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2026 Sasha Levin + * + * Empty lineinfo stub for the initial vmlinux link. + * The real lineinfo is generated from .tmp_vmlinux1 by gen_lineinfo. + */ + .section .rodata, "a" + .globl lineinfo_num_entries + .balign 4 +lineinfo_num_entries: + .long 0 + .globl lineinfo_num_files + .balign 4 +lineinfo_num_files: + .long 0 + .globl lineinfo_addrs +lineinfo_addrs: + .globl lineinfo_file_ids +lineinfo_file_ids: + .globl lineinfo_lines +lineinfo_lines: + .globl lineinfo_file_offsets +lineinfo_file_offsets: + .globl lineinfo_filenames_size + .balign 4 +lineinfo_filenames_size: + .long 0 + .globl lineinfo_filenames +lineinfo_filenames: diff --git a/scripts/gen_lineinfo.c b/scripts/gen_lineinfo.c new file mode 100644 index 0000000000000..699e760178f09 --- /dev/null +++ b/scripts/gen_lineinfo.c @@ -0,0 +1,557 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * gen_lineinfo.c - Generate address-to-source-line lookup tables from DWARF + * + * Copyright (C) 2026 Sasha Levin + * + * Reads DWARF .debug_line from a vmlinux ELF file and outputs an assembly + * file containing sorted lookup tables that the kernel uses to annotate + * stack traces with source file:line information. + * + * Requires libdw from elfutils. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static unsigned int skipped_overflow; + +/* + * vmlinux mode: end of the invariant .text region. Zero means "no cap" + * (graceful fallback when _etext is absent on some build). + */ +static unsigned long long text_end_addr; + +struct line_entry { + unsigned int offset; /* offset from _text */ + unsigned int file_id; + unsigned int line; +}; + +struct file_entry { + char *name; + unsigned int id; + unsigned int str_offset; +}; + +static struct line_entry *entries; +static unsigned int num_entries; +static unsigned int entries_capacity; + +static struct file_entry *files; +static unsigned int num_files; +static unsigned int files_capacity; + +#define FILE_HASH_BITS 13 +#define FILE_HASH_SIZE (1 << FILE_HASH_BITS) + +struct file_hash_entry { + const char *name; + unsigned int id; +}; + +static struct file_hash_entry file_hash[FILE_HASH_SIZE]; + +static unsigned int hash_str(const char *s) +{ + unsigned int h = 5381; + + for (; *s; s++) + h = h * 33 + (unsigned char)*s; + return h & (FILE_HASH_SIZE - 1); +} + +static void add_entry(unsigned int offset, unsigned int file_id, + unsigned int line) +{ + if (num_entries >= entries_capacity) { + entries_capacity = entries_capacity ? entries_capacity * 2 : 65536; + entries = realloc(entries, entries_capacity * sizeof(*entries)); + if (!entries) { + fprintf(stderr, "out of memory\n"); + exit(1); + } + } + entries[num_entries].offset = offset; + entries[num_entries].file_id = file_id; + entries[num_entries].line = line; + num_entries++; +} + +static unsigned int find_or_add_file(const char *name) +{ + unsigned int h = hash_str(name); + + /* Open-addressing lookup with linear probing */ + while (file_hash[h].name) { + if (!strcmp(file_hash[h].name, name)) + return file_hash[h].id; + h = (h + 1) & (FILE_HASH_SIZE - 1); + } + + if (num_files >= 65535) { + fprintf(stderr, + "gen_lineinfo: too many source files (%u > 65535)\n", + num_files); + exit(1); + } + + if (num_files >= files_capacity) { + files_capacity = files_capacity ? files_capacity * 2 : 4096; + files = realloc(files, files_capacity * sizeof(*files)); + if (!files) { + fprintf(stderr, "out of memory\n"); + exit(1); + } + } + files[num_files].name = strdup(name); + files[num_files].id = num_files; + + /* Insert into hash table (points to files[] entry) */ + file_hash[h].name = files[num_files].name; + file_hash[h].id = num_files; + + num_files++; + return num_files - 1; +} + +/* + * Well-known top-level directories in the kernel source tree. + * Used as a fallback to recover relative paths from absolute DWARF paths + * when comp_dir doesn't match (e.g. O= out-of-tree builds where comp_dir + * is the build directory but source paths point into the source tree). + */ +static const char * const kernel_dirs[] = { + "arch/", "block/", "certs/", "crypto/", "drivers/", "fs/", + "include/", "init/", "io_uring/", "ipc/", "kernel/", "lib/", + "mm/", "net/", "rust/", "samples/", "scripts/", "security/", + "sound/", "tools/", "usr/", "virt/", +}; + +/* + * Strip a filename to a kernel-relative path. + * + * For absolute paths, strip the comp_dir prefix (from DWARF) to get + * a kernel-tree-relative path. When that fails (e.g. O= builds where + * comp_dir is the build directory), scan for a well-known kernel + * top-level directory name in the path to recover the relative path. + * Fall back to the basename as a last resort. + * + * For relative paths (common in modules), libdw may produce a bogus + * doubled path like "net/foo/bar.c/net/foo/bar.c" due to ET_REL DWARF + * quirks. Detect and strip such duplicates. + */ +static const char *make_relative(const char *path, const char *comp_dir) +{ + const char *p; + + /* If already relative, use as-is */ + if (path[0] != '/') + return path; + + /* comp_dir from DWARF is the most reliable method */ + if (comp_dir) { + size_t len = strlen(comp_dir); + + if (!strncmp(path, comp_dir, len) && path[len] == '/') { + const char *rel = path + len + 1; + + /* + * If comp_dir pointed to a subdirectory + * (e.g. arch/parisc/kernel) rather than + * the tree root, stripping it leaves a + * bare filename. Fall through to the + * kernel_dirs scan so we recover the full + * relative path instead. + */ + if (strchr(rel, '/')) + return rel; + } + + /* + * comp_dir prefix didn't help — either it didn't match + * or it was too specific and left a bare filename. + * Scan for a known kernel top-level directory component + * to find where the relative path starts. This handles + * O= builds and arches where comp_dir is a subdirectory. + */ + for (p = path + 1; *p; p++) { + if (*(p - 1) == '/') { + for (unsigned int i = 0; i < sizeof(kernel_dirs) / + sizeof(kernel_dirs[0]); i++) { + if (!strncmp(p, kernel_dirs[i], + strlen(kernel_dirs[i]))) + return p; + } + } + } + + /* Fall back to basename */ + p = strrchr(path, '/'); + return p ? p + 1 : path; + } + + /* Fall back to basename */ + p = strrchr(path, '/'); + return p ? p + 1 : path; +} + +static int compare_entries(const void *a, const void *b) +{ + const struct line_entry *ea = a; + const struct line_entry *eb = b; + + if (ea->offset != eb->offset) + return ea->offset < eb->offset ? -1 : 1; + if (ea->file_id != eb->file_id) + return ea->file_id < eb->file_id ? -1 : 1; + if (ea->line != eb->line) + return ea->line < eb->line ? -1 : 1; + return 0; +} + +/* + * Look up a vmlinux symbol by exact name and return its st_value, or + * @fallback if absent. Aborts when @required and the symbol is missing. + */ +static unsigned long long find_vmlinux_sym(Elf *elf, const char *name, + unsigned long long fallback, + bool required) +{ + size_t nsyms, i; + Elf_Scn *scn = NULL; + GElf_Shdr shdr; + + while ((scn = elf_nextscn(elf, scn)) != NULL) { + Elf_Data *data; + + if (!gelf_getshdr(scn, &shdr)) + continue; + if (shdr.sh_type != SHT_SYMTAB) + continue; + + data = elf_getdata(scn, NULL); + if (!data) + continue; + + nsyms = shdr.sh_size / shdr.sh_entsize; + for (i = 0; i < nsyms; i++) { + GElf_Sym sym; + const char *sname; + + if (!gelf_getsym(data, i, &sym)) + continue; + sname = elf_strptr(elf, shdr.sh_link, sym.st_name); + if (sname && !strcmp(sname, name)) + return sym.st_value; + } + } + + if (required) { + fprintf(stderr, "Cannot find %s symbol\n", name); + exit(1); + } + return fallback; +} + +static unsigned long long find_text_addr(Elf *elf) +{ + return find_vmlinux_sym(elf, "_text", 0, true); +} + +/* + * vmlinux is linked in multiple passes: gen_lineinfo runs against + * .tmp_vmlinux1 (which carries an empty lineinfo stub), then real tables + * are linked in for the final image. Sections placed AFTER .rodata + * (.init.text, .exit.text, ...) shift forward as .rodata grows to hold + * the real lineinfo blob, so DWARF addresses we'd capture for them in + * pass 1 would be stale in the final kernel. Cap captured addresses at + * _etext, the symbol that marks the end of .text — placed before .rodata + * in every architecture's vmlinux.lds.S, so its addresses are invariant + * across the relink. Returns 0 if _etext is absent (no cap; v3 behavior). + */ +static unsigned long long find_text_end_addr(Elf *elf) +{ + return find_vmlinux_sym(elf, "_etext", 0, false); +} + +static void process_dwarf(Dwarf *dwarf, unsigned long long text_addr) +{ + Dwarf_Off off = 0, next_off; + size_t hdr_size; + + while (dwarf_nextcu(dwarf, off, &next_off, &hdr_size, + NULL, NULL, NULL) == 0) { + Dwarf_Die cudie; + Dwarf_Lines *lines; + size_t nlines; + Dwarf_Attribute attr; + const char *comp_dir = NULL; + + if (!dwarf_offdie(dwarf, off + hdr_size, &cudie)) + goto next; + + if (dwarf_attr(&cudie, DW_AT_comp_dir, &attr)) + comp_dir = dwarf_formstring(&attr); + + if (dwarf_getsrclines(&cudie, &lines, &nlines) != 0) + goto next; + + for (size_t i = 0; i < nlines; i++) { + Dwarf_Line *line = dwarf_onesrcline(lines, i); + Dwarf_Addr addr; + const char *src; + const char *rel; + unsigned int file_id, loffset; + int lineno; + + if (!line) + continue; + + if (dwarf_lineaddr(line, &addr) != 0) + continue; + if (dwarf_lineno(line, &lineno) != 0) + continue; + if (lineno == 0) + continue; + + src = dwarf_linesrc(line, NULL, NULL); + if (!src) + continue; + + if (addr < text_addr) + continue; + /* + * Skip addresses past _etext. Sections after .rodata + * shift when the real lineinfo replaces the empty stub + * during the multi-pass vmlinux link, so any address + * we'd capture there would be stale by the time the + * final kernel runs. + */ + if (text_end_addr && addr >= text_end_addr) + continue; + + { + unsigned long long raw_offset = addr - text_addr; + + if (raw_offset > UINT_MAX) { + skipped_overflow++; + continue; + } + loffset = (unsigned int)raw_offset; + } + + rel = make_relative(src, comp_dir); + file_id = find_or_add_file(rel); + + add_entry(loffset, file_id, (unsigned int)lineno); + } +next: + off = next_off; + } +} + +static void deduplicate(void) +{ + unsigned int i, j; + + if (num_entries < 2) + return; + + /* Sort by offset, then file_id, then line for stability */ + qsort(entries, num_entries, sizeof(*entries), compare_entries); + + /* + * Remove duplicate entries: + * - Same offset: keep first (deterministic from stable sort keys) + * - Same file:line as previous kept entry: redundant for binary + * search -- any address between them resolves to the earlier one + */ + j = 0; + for (i = 1; i < num_entries; i++) { + if (entries[i].offset == entries[j].offset) + continue; + if (entries[i].file_id == entries[j].file_id && + entries[i].line == entries[j].line) + continue; + j++; + if (j != i) + entries[j] = entries[i]; + } + num_entries = j + 1; +} + +static void compute_file_offsets(void) +{ + unsigned int offset = 0; + + for (unsigned int i = 0; i < num_files; i++) { + files[i].str_offset = offset; + offset += strlen(files[i].name) + 1; + } +} + +static void print_escaped_asciz(const char *s) +{ + printf("\t.asciz \""); + for (; *s; s++) { + if (*s == '"' || *s == '\\') + putchar('\\'); + putchar(*s); + } + printf("\"\n"); +} + +static void output_assembly(void) +{ + printf("/* SPDX-License-Identifier: GPL-2.0 */\n"); + printf("/*\n"); + printf(" * Automatically generated by scripts/gen_lineinfo\n"); + printf(" * Do not edit.\n"); + printf(" */\n\n"); + + printf("\t.section .rodata, \"a\"\n\n"); + + /* Number of entries */ + printf("\t.globl lineinfo_num_entries\n"); + printf("\t.balign 4\n"); + printf("lineinfo_num_entries:\n"); + printf("\t.long %u\n\n", num_entries); + + /* Number of files */ + printf("\t.globl lineinfo_num_files\n"); + printf("\t.balign 4\n"); + printf("lineinfo_num_files:\n"); + printf("\t.long %u\n\n", num_files); + + /* Sorted address offsets from _text */ + printf("\t.globl lineinfo_addrs\n"); + printf("\t.balign 4\n"); + printf("lineinfo_addrs:\n"); + for (unsigned int i = 0; i < num_entries; i++) + printf("\t.long 0x%x\n", entries[i].offset); + printf("\n"); + + /* File IDs, parallel to addrs (u16 -- supports up to 65535 files) */ + printf("\t.globl lineinfo_file_ids\n"); + printf("\t.balign 2\n"); + printf("lineinfo_file_ids:\n"); + for (unsigned int i = 0; i < num_entries; i++) + printf("\t.short %u\n", entries[i].file_id); + printf("\n"); + + /* Line numbers, parallel to addrs */ + printf("\t.globl lineinfo_lines\n"); + printf("\t.balign 4\n"); + printf("lineinfo_lines:\n"); + for (unsigned int i = 0; i < num_entries; i++) + printf("\t.long %u\n", entries[i].line); + printf("\n"); + + /* File string offset table */ + printf("\t.globl lineinfo_file_offsets\n"); + printf("\t.balign 4\n"); + printf("lineinfo_file_offsets:\n"); + for (unsigned int i = 0; i < num_files; i++) + printf("\t.long %u\n", files[i].str_offset); + printf("\n"); + + /* Filenames size */ + { + unsigned int fsize = 0; + + for (unsigned int i = 0; i < num_files; i++) + fsize += strlen(files[i].name) + 1; + printf("\t.globl lineinfo_filenames_size\n"); + printf("\t.balign 4\n"); + printf("lineinfo_filenames_size:\n"); + printf("\t.long %u\n\n", fsize); + } + + /* Concatenated NUL-terminated filenames */ + printf("\t.globl lineinfo_filenames\n"); + printf("lineinfo_filenames:\n"); + for (unsigned int i = 0; i < num_files; i++) + print_escaped_asciz(files[i].name); + printf("\n"); +} + +int main(int argc, char *argv[]) +{ + int fd; + Elf *elf; + Dwarf *dwarf; + unsigned long long text_addr; + + if (argc != 2) { + fprintf(stderr, "Usage: %s \n", argv[0]); + return 1; + } + + fd = open(argv[1], O_RDONLY); + if (fd < 0) { + fprintf(stderr, "Cannot open %s: %s\n", argv[1], + strerror(errno)); + return 1; + } + + elf_version(EV_CURRENT); + elf = elf_begin(fd, ELF_C_READ, NULL); + if (!elf) { + fprintf(stderr, "elf_begin failed: %s\n", + elf_errmsg(elf_errno())); + close(fd); + return 1; + } + + text_addr = find_text_addr(elf); + text_end_addr = find_text_end_addr(elf); + + dwarf = dwarf_begin_elf(elf, DWARF_C_READ, NULL); + if (!dwarf) { + fprintf(stderr, "dwarf_begin_elf failed: %s\n", + dwarf_errmsg(dwarf_errno())); + fprintf(stderr, "Is %s built with CONFIG_DEBUG_INFO?\n", + argv[1]); + elf_end(elf); + close(fd); + return 1; + } + + process_dwarf(dwarf, text_addr); + + if (skipped_overflow) + fprintf(stderr, + "lineinfo: warning: %u entries skipped (offset > 4 GiB from _text)\n", + skipped_overflow); + + deduplicate(); + compute_file_offsets(); + + fprintf(stderr, "lineinfo: %u entries, %u files\n", + num_entries, num_files); + + output_assembly(); + + dwarf_end(dwarf); + elf_end(elf); + close(fd); + + /* Cleanup */ + free(entries); + for (unsigned int i = 0; i < num_files; i++) + free(files[i].name); + free(files); + + return 0; +} diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 37d5c095ad22a..42662c4fbc6c9 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -78,6 +78,17 @@ static char *sym_name(const struct sym_entry *s) static bool is_ignored_symbol(const char *name, char type) { + /* Ignore lineinfo symbols for kallsyms pass stability */ + static const char * const lineinfo_syms[] = { + "lineinfo_addrs", + "lineinfo_file_ids", + "lineinfo_file_offsets", + "lineinfo_filenames", + "lineinfo_lines", + "lineinfo_num_entries", + "lineinfo_num_files", + }; + if (type == 'u' || type == 'n') return true; @@ -90,6 +101,11 @@ static bool is_ignored_symbol(const char *name, char type) return true; } + for (size_t i = 0; i < ARRAY_SIZE(lineinfo_syms); i++) { + if (!strcmp(name, lineinfo_syms[i])) + return true; + } + return false; } diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index f99e196abeea4..39ca44fbb259b 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -103,7 +103,7 @@ vmlinux_link() ${ld} ${ldflags} -o ${output} \ ${wl}--whole-archive ${objs} ${wl}--no-whole-archive \ ${wl}--start-group ${libs} ${wl}--end-group \ - ${kallsymso} ${btf_vmlinux_bin_o} ${arch_vmlinux_o} ${ldlibs} + ${kallsymso} ${lineinfo_o} ${btf_vmlinux_bin_o} ${arch_vmlinux_o} ${ldlibs} } # Create ${2}.o file with all symbols from the ${1} object file @@ -129,6 +129,26 @@ kallsyms() kallsymso=${2}.o } +# Generate lineinfo tables from DWARF debug info in a temporary vmlinux. +# ${1} - temporary vmlinux with debug info +# Output: sets lineinfo_o to the generated .o file +gen_lineinfo() +{ + info LINEINFO .tmp_lineinfo.S + if ! scripts/gen_lineinfo "${1}" > .tmp_lineinfo.S; then + echo >&2 "Failed to generate lineinfo from ${1}" + echo >&2 "Try to disable CONFIG_KALLSYMS_LINEINFO" + exit 1 + fi + + info AS .tmp_lineinfo.o + ${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \ + ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \ + -c -o .tmp_lineinfo.o .tmp_lineinfo.S + + lineinfo_o=.tmp_lineinfo.o +} + # Perform kallsyms for the given temporary vmlinux. sysmap_and_kallsyms() { @@ -155,6 +175,7 @@ sorttable() cleanup() { rm -f .btf.* + rm -f .tmp_lineinfo.* rm -f .tmp_vmlinux.nm-sort rm -f System.map rm -f vmlinux @@ -183,6 +204,7 @@ fi btf_vmlinux_bin_o= btfids_vmlinux= kallsymso= +lineinfo_o= strip_debug= generate_map= @@ -198,10 +220,21 @@ if is_enabled CONFIG_KALLSYMS; then kallsyms .tmp_vmlinux0.syms .tmp_vmlinux0.kallsyms fi +if is_enabled CONFIG_KALLSYMS_LINEINFO; then + # Assemble an empty lineinfo stub for the initial link. + # The real lineinfo is generated from .tmp_vmlinux1 by gen_lineinfo. + ${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \ + ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \ + -c -o .tmp_lineinfo.o "${srctree}/scripts/empty_lineinfo.S" + lineinfo_o=.tmp_lineinfo.o +fi + if is_enabled CONFIG_KALLSYMS || is_enabled CONFIG_DEBUG_INFO_BTF; then - # The kallsyms linking does not need debug symbols, but the BTF does. - if ! is_enabled CONFIG_DEBUG_INFO_BTF; then + # The kallsyms linking does not need debug symbols, but BTF and + # lineinfo generation do. + if ! is_enabled CONFIG_DEBUG_INFO_BTF && + ! is_enabled CONFIG_KALLSYMS_LINEINFO; then strip_debug=1 fi @@ -219,6 +252,10 @@ if is_enabled CONFIG_DEBUG_INFO_BTF; then btfids_vmlinux=.tmp_vmlinux1.BTF_ids fi +if is_enabled CONFIG_KALLSYMS_LINEINFO; then + gen_lineinfo .tmp_vmlinux1 +fi + if is_enabled CONFIG_KALLSYMS; then # kallsyms support -- 2.53.0