* [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue
@ 2024-06-13 13:37 Zheng Yejian
2024-06-13 13:37 ` [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc() Zheng Yejian
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
ftrace_location() was changed to not only return the __fentry__ location
when called for the __fentry__ location, but also when called for the
sym+0 location after commit aebfd12521d9 ("x86/ibt,ftrace: Search for
__fentry__ location"). That is, if sym+0 location is not __fentry__,
ftrace_location() would find one over the entire size of the sym.
However, there is case that more than one __fentry__ exist in the sym
range (described below) and ftrace_location() would find wrong __fentry__
location by binary searching, which would cause its users like livepatch/
kprobe/bpf to not work properly on this sym!
The case is that, based on current compiler behavior, suppose:
- function A is followed by weak function B1 in same binary file;
- weak function B1 is overridden by function B2;
Then in the final binary file:
- symbol B1 will be removed from symbol table while its instructions are
not removed;
- __fentry__ of B1 will be still in __mcount_loc table;
- function size of A is computed by substracting the symbol address of
A from its next symbol address (see kallsyms_lookup_size_offset()),
but because symbol info of B1 is removed, the next symbol of A is
originally the next symbol of B1. See following example, function
sizeof A will be (symbol_address_C - symbol_address_A):
symbol_address_A
symbol_address_B1 (Not in symbol table)
symbol_address_C
The weak function issue has been discovered in commit b39181f7c690
("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
but it didn't resolve the issue in ftrace_location().
Peter suggested to use entry size for FUNC type objects to find holes in
the text and fill them with a symbol, then check the mcount locations
against the symbol table and for every one that falls in a hole [1] [2].
What the patch set does is described as follows:
- Patch 1: Do an optimization for scripts/kallsym.c about memory allocation
when read symbols from file. This patch has little to do with the above
issue, but since I changed this script, so it also can be reviewed here;
- Patch 2: Change scripts/kallsyms.c to emit a symbol where there is a hole
in the text, the symbol name is temporarily named "__hole_symbol_XXXXX";
- Patch 3: When lookup symbols in module, use entry size info to determine
the exact boundaries of a function symbol;
- Patch 4: Holes in text have been found in previous patches, now check
__fentry__ in mcount table and skip those locate in the holes;
- Patch 5: Accidentally found a out-of-bound issue when all __fentry__
are skipped, so fix it;
- Patch 6: Revert Steve's patch about the FTRACE_MCOUNT_MAX_OFFSET
solution, also two related definition for powerpc.
[1] https://lore.kernel.org/all/20240607150228.GR8774@noisy.programming.kicks-ass.net/
[2] https://lore.kernel.org/all/20240611092157.GU40213@noisy.programming.kicks-ass.net/
Zheng Yejian (6):
kallsyms: Optimize multiple times of realloc() to one time of malloc()
kallsyms: Emit symbol at the holes in the text
module: kallsyms: Determine exact function size
ftrace: Skip invalid __fentry__ in ftrace_process_locs()
ftrace: Fix possible out-of-bound issue in ftrace_process_locs()
ftrace: Revert the FTRACE_MCOUNT_MAX_OFFSET workaround
arch/powerpc/include/asm/ftrace.h | 7 --
arch/x86/include/asm/ftrace.h | 7 --
include/linux/kallsyms.h | 13 +++
include/linux/module.h | 14 +++
kernel/module/kallsyms.c | 42 ++++++--
kernel/trace/ftrace.c | 174 ++++++------------------------
scripts/kallsyms.c | 134 ++++++++++++++++++++---
scripts/link-vmlinux.sh | 4 +-
scripts/mksysmap | 2 +-
9 files changed, 216 insertions(+), 181 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc()
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
@ 2024-06-13 13:37 ` Zheng Yejian
2024-07-15 20:12 ` Masahiro Yamada
2024-06-13 13:37 ` [PATCH 2/6] kallsyms: Emit symbol at the holes in the text Zheng Yejian
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
Array 'table' is used to store pointers of symbols that read from in.map
file, and its size depends on the number of symbols. Currently 'table'
is expanded by calling realloc() every 10000 symbols read.
However, there generally are around 100000+ symbols, which means that
the expansion is generally 10+ times.
As an optimization, introduce linked list 'sym_list' to associate and
count all symbols, then store them into 'table' at one time.
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
scripts/kallsyms.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 47978efe4797..6559a9802f6e 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -33,6 +33,7 @@
#define KSYM_NAME_LEN 512
struct sym_entry {
+ struct sym_entry *next;
unsigned long long addr;
unsigned int len;
unsigned int seq;
@@ -60,7 +61,8 @@ static struct addr_range percpu_range = {
};
static struct sym_entry **table;
-static unsigned int table_size, table_cnt;
+static struct sym_entry *sym_list;
+static unsigned int table_cnt;
static int all_symbols;
static int absolute_percpu;
static int base_relative;
@@ -273,6 +275,7 @@ static void read_map(const char *in)
struct sym_entry *sym;
char *buf = NULL;
size_t buflen = 0;
+ int i;
fp = fopen(in, "r");
if (!fp) {
@@ -286,18 +289,22 @@ static void read_map(const char *in)
continue;
sym->start_pos = table_cnt;
-
- if (table_cnt >= table_size) {
- table_size += 10000;
- table = realloc(table, sizeof(*table) * table_size);
- if (!table) {
- fprintf(stderr, "out of memory\n");
- fclose(fp);
- exit (1);
- }
- }
-
- table[table_cnt++] = sym;
+ table_cnt++;
+ sym->next = sym_list;
+ sym_list = sym;
+ }
+ table = malloc(sizeof(*table) * table_cnt);
+ if (!table) {
+ fprintf(stderr, "unable to allocate memory for table\n");
+ free(buf);
+ fclose(fp);
+ exit(EXIT_FAILURE);
+ }
+ sym = sym_list;
+ i = table_cnt - 1;
+ while (sym) {
+ table[i--] = sym;
+ sym = sym->next;
}
free(buf);
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/6] kallsyms: Emit symbol at the holes in the text
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
2024-06-13 13:37 ` [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc() Zheng Yejian
@ 2024-06-13 13:37 ` Zheng Yejian
2024-07-16 8:33 ` Masahiro Yamada
2024-06-13 13:37 ` [PATCH 3/6] module: kallsyms: Determine exact function size Zheng Yejian
` (3 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
When a weak type function is overridden, its symbol will be removed
from the symbol table, but its code will not be removed. Besides,
due to lacking of size for kallsyms, kernel compute function size by
substracting its symbol address from its next symbol address (see
kallsyms_lookup_size_offset()). These will cause that size of some
function is computed to be larger than it actually is, just because
symbol of its following weak function is removed.
This issue also causes multiple __fentry__ locations to be counted in
the some function scope, and eventually causes ftrace_location() to find
wrong __fentry__ location. It was reported in
Link: https://lore.kernel.org/all/20240607115211.734845-1-zhengyejian1@huawei.com/
Peter suggested to change scipts/kallsyms.c to emit readily
identifiable symbol names for all the weak junk, eg:
__weak_junk_NNNNN
The name of this kind symbol needs some discussion, but it's temporarily
called "__hole_symbol_XXXXX" in this patch:
1. Pass size info to scripts/kallsyms (see mksysmap());
2. Traverse sorted function symbols, if one function address plus its
size less than next function address, it means there's a hole, then
emit a symbol "__hole_symbol_XXXXX" there which type is 't'.
After this patch, the effect is as follows:
$ cat /proc/kallsyms | grep -A 3 do_one_initcall
ffffffff810021e0 T do_one_initcall
ffffffff8100245e t __hole_symbol_XXXXX
ffffffff810024a0 t __pfx_rootfs_init_fs_context
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
scripts/kallsyms.c | 101 +++++++++++++++++++++++++++++++++++++++-
scripts/link-vmlinux.sh | 4 +-
scripts/mksysmap | 2 +-
3 files changed, 102 insertions(+), 5 deletions(-)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 6559a9802f6e..5c4cde864a04 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -35,6 +35,7 @@
struct sym_entry {
struct sym_entry *next;
unsigned long long addr;
+ unsigned long long size;
unsigned int len;
unsigned int seq;
unsigned int start_pos;
@@ -74,6 +75,7 @@ static int token_profit[0x10000];
static unsigned char best_table[256][2];
static unsigned char best_table_len[256];
+static const char hole_symbol[] = "__hole_symbol_XXXXX";
static void usage(void)
{
@@ -130,8 +132,16 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
size_t len;
ssize_t readlen;
struct sym_entry *sym;
+ unsigned long long size = 0;
errno = 0;
+ /*
+ * Example of expected symbol format:
+ * 1. symbol with size info:
+ * ffffffff81000070 00000000000001d7 T __startup_64
+ * 2. symbol without size info:
+ * 0000000002a00000 A text_size
+ */
readlen = getline(buf, buf_len, in);
if (readlen < 0) {
if (errno) {
@@ -145,9 +155,24 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
(*buf)[readlen - 1] = 0;
addr = strtoull(*buf, &p, 16);
+ if (*buf == p || *p++ != ' ') {
+ fprintf(stderr, "line format error: unable to parse address\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (*p == '0') {
+ char *str = p;
- if (*buf == p || *p++ != ' ' || !isascii((type = *p++)) || *p++ != ' ') {
- fprintf(stderr, "line format error\n");
+ size = strtoull(str, &p, 16);
+ if (str == p || *p++ != ' ') {
+ fprintf(stderr, "line format error: unable to parse size\n");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ type = *p++;
+ if (!isascii(type) || *p++ != ' ') {
+ fprintf(stderr, "line format error: unable to parse type\n");
exit(EXIT_FAILURE);
}
@@ -182,6 +207,7 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
exit(EXIT_FAILURE);
}
sym->addr = addr;
+ sym->size = size;
sym->len = len;
sym->sym[0] = type;
strcpy(sym_name(sym), name);
@@ -795,6 +821,76 @@ static void sort_symbols(void)
qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
}
+static int may_exist_hole_after_symbol(const struct sym_entry *se)
+{
+ char type = se->sym[0];
+
+ /* Only check text symbol or weak symbol */
+ if (type != 't' && type != 'T' &&
+ type != 'w' && type != 'W')
+ return 0;
+ /* Symbol without size has no hole */
+ return se->size != 0;
+}
+
+static struct sym_entry *gen_hole_symbol(unsigned long long addr)
+{
+ struct sym_entry *sym;
+ static size_t len = sizeof(hole_symbol);
+
+ /* include type field */
+ sym = malloc(sizeof(*sym) + len + 1);
+ if (!sym) {
+ fprintf(stderr, "unable to allocate memory for hole symbol\n");
+ exit(EXIT_FAILURE);
+ }
+ sym->addr = addr;
+ sym->size = 0;
+ sym->len = len;
+ sym->sym[0] = 't';
+ strcpy(sym_name(sym), hole_symbol);
+ sym->percpu_absolute = 0;
+ return sym;
+}
+
+static void emit_hole_symbols(void)
+{
+ unsigned int i, pos, nr_emit;
+ struct sym_entry **new_table;
+ unsigned int new_cnt;
+
+ nr_emit = 0;
+ for (i = 0; i < table_cnt - 1; i++) {
+ if (may_exist_hole_after_symbol(table[i]) &&
+ table[i]->addr + table[i]->size < table[i+1]->addr)
+ nr_emit++;
+ }
+ if (!nr_emit)
+ return;
+
+ new_cnt = table_cnt + nr_emit;
+ new_table = malloc(sizeof(*new_table) * new_cnt);
+ if (!new_table) {
+ fprintf(stderr, "unable to allocate memory for new table\n");
+ exit(EXIT_FAILURE);
+ }
+
+ pos = 0;
+ for (i = 0; i < table_cnt; i++) {
+ unsigned long long addr;
+
+ new_table[pos++] = table[i];
+ if ((i == table_cnt - 1) || !may_exist_hole_after_symbol(table[i]))
+ continue;
+ addr = table[i]->addr + table[i]->size;
+ if (addr < table[i+1]->addr)
+ new_table[pos++] = gen_hole_symbol(addr);
+ }
+ free(table);
+ table = new_table;
+ table_cnt = new_cnt;
+}
+
static void make_percpus_absolute(void)
{
unsigned int i;
@@ -854,6 +950,7 @@ int main(int argc, char **argv)
if (absolute_percpu)
make_percpus_absolute();
sort_symbols();
+ emit_hole_symbols();
if (base_relative)
record_relative_base();
optimize_token_table();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 518c70b8db50..8e1373902bfe 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -189,11 +189,11 @@ kallsyms_step()
}
# Create map file with all symbols from ${1}
-# See mksymap for additional details
+# See mksysmap for additional details
mksysmap()
{
info NM ${2}
- ${NM} -n "${1}" | sed -f "${srctree}/scripts/mksysmap" > "${2}"
+ ${NM} -nS "${1}" | sed -f "${srctree}/scripts/mksysmap" > "${2}"
}
sorttable()
diff --git a/scripts/mksysmap b/scripts/mksysmap
index c12723a04655..7a4415f21143 100755
--- a/scripts/mksysmap
+++ b/scripts/mksysmap
@@ -2,7 +2,7 @@
# SPDX-License-Identifier: GPL-2.0-only
#
# sed script to filter out symbols that are not needed for System.map,
-# or not suitable for kallsyms. The input should be 'nm -n <file>'.
+# or not suitable for kallsyms. The input should be 'nm -nS <file>'.
#
# System.map is used by module-init tools and some debugging
# tools to retrieve the actual addresses of symbols in the kernel.
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/6] module: kallsyms: Determine exact function size
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
2024-06-13 13:37 ` [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc() Zheng Yejian
2024-06-13 13:37 ` [PATCH 2/6] kallsyms: Emit symbol at the holes in the text Zheng Yejian
@ 2024-06-13 13:37 ` Zheng Yejian
2024-06-13 13:37 ` [PATCH 4/6] ftrace: Skip invalid __fentry__ in ftrace_process_locs() Zheng Yejian
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
When a weak type function is overridden, its symbol will be removed
from the symbol table, but its code will not been removed. It will
cause find_kallsyms_symbol() to compute a larger function size than
it actually is, just because symbol of its following weak function is
removed.
To fix this issue, check that an given address is within the size of
the function found.
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
include/linux/module.h | 7 +++++++
kernel/module/kallsyms.c | 19 +++++++++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index ffa1c603163c..13518f464d3f 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -597,6 +597,13 @@ static inline unsigned long kallsyms_symbol_value(const Elf_Sym *sym)
}
#endif
+#ifndef HAVE_ARCH_KALLSYMS_SYMBOL_TYPE
+static inline unsigned int kallsyms_symbol_type(const Elf_Sym *sym)
+{
+ return ELF_ST_TYPE(sym->st_info);
+}
+#endif
+
/* FIXME: It'd be nice to isolate modules during init, too, so they
aren't used before they (may) fail. But presently too much code
(IDE & SCSI) require entry into the module during init.*/
diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
index 62fb57bb9f16..092ae6f43dad 100644
--- a/kernel/module/kallsyms.c
+++ b/kernel/module/kallsyms.c
@@ -262,6 +262,7 @@ static const char *find_kallsyms_symbol(struct module *mod,
unsigned long nextval, bestval;
struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
struct module_memory *mod_mem;
+ const Elf_Sym *sym;
/* At worse, next value is at end of module */
if (within_module_init(addr, mod))
@@ -278,9 +279,10 @@ static const char *find_kallsyms_symbol(struct module *mod,
* starts real symbols at 1).
*/
for (i = 1; i < kallsyms->num_symtab; i++) {
- const Elf_Sym *sym = &kallsyms->symtab[i];
- unsigned long thisval = kallsyms_symbol_value(sym);
+ unsigned long thisval;
+ sym = &kallsyms->symtab[i];
+ thisval = kallsyms_symbol_value(sym);
if (sym->st_shndx == SHN_UNDEF)
continue;
@@ -292,6 +294,13 @@ static const char *find_kallsyms_symbol(struct module *mod,
is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
continue;
+ if (kallsyms_symbol_type(sym) == STT_FUNC &&
+ addr >= thisval && addr < thisval + sym->st_size) {
+ best = i;
+ bestval = thisval;
+ nextval = thisval + sym->st_size;
+ goto find;
+ }
if (thisval <= addr && thisval > bestval) {
best = i;
bestval = thisval;
@@ -303,6 +312,12 @@ static const char *find_kallsyms_symbol(struct module *mod,
if (!best)
return NULL;
+ sym = &kallsyms->symtab[best];
+ if (kallsyms_symbol_type(sym) == STT_FUNC && sym->st_size &&
+ addr >= kallsyms_symbol_value(sym) + sym->st_size)
+ return NULL;
+
+find:
if (size)
*size = nextval - bestval;
if (offset)
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] ftrace: Skip invalid __fentry__ in ftrace_process_locs()
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
` (2 preceding siblings ...)
2024-06-13 13:37 ` [PATCH 3/6] module: kallsyms: Determine exact function size Zheng Yejian
@ 2024-06-13 13:37 ` Zheng Yejian
2024-06-13 13:37 ` [PATCH 5/6] ftrace: Fix possible out-of-bound issue " Zheng Yejian
2024-06-13 13:37 ` [PATCH 6/6] ftrace: Revert the FTRACE_MCOUNT_MAX_OFFSET workaround Zheng Yejian
5 siblings, 0 replies; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
ftrace_location() was changed to not only return the __fentry__ location
when called for the __fentry__ location, but also when called for the
sym+0 location after commit aebfd12521d9 ("x86/ibt,ftrace: Search for
__fentry__ location"). That is, if sym+0 location is not __fentry__,
ftrace_location() would find one over the entire size of the sym.
However, there is case that more than one __fentry__ exist in the sym
range (described below) and ftrace_location() would find wrong __fentry__
location by binary searching, which would cause its users like livepatch/
kprobe/bpf to not work properly on this sym!
The case is that, based on current compiler behavior, suppose:
- function A is followed by weak function B1 in same binary file;
- weak function B1 is overridden by function B2;
Then in the final binary file:
- symbol B1 will be removed from symbol table while its instructions are
not removed;
- __fentry__ of B1 will be still in __mcount_loc table;
- function size of A is computed by substracting the symbol address of
A from its next symbol address (see kallsyms_lookup_size_offset()),
but because symbol info of B1 is removed, the next symbol of A is
originally the next symbol of B1. See following example, function
sizeof A will be (symbol_address_C - symbol_address_A):
symbol_address_A
symbol_address_B1 (Not in symbol table)
symbol_address_C
The weak function issue has been discovered in commit b39181f7c690
("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
but it didn't resolve the issue in ftrace_location().
To solve the issue, with Peter's suggestions, in previous patches, all
holes in the text have been found and filled with specail symbols, also
the same case with module weak function has been handled. Then check and
skip __fentry__ that locate in the holes.
Also in this patch, introduce following helper functions:
1. kallsyms_is_hole_symbol() is used to check if a function name is
of a hole symbol;
2. module_kallsyms_find_symbol() is used to check if a __fentry__
locate in a valid function of the given module. It is needed because
other symbol lookup functions like module_address_lookup() will find
module of the passed address, but as ftrace_process_locs() is called,
the module has not been fully loaded, so those lookup functions can
not work.
Fixes: aebfd12521d9 ("x86/ibt,ftrace: Search for __fentry__ location")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
include/linux/kallsyms.h | 13 +++++++++++++
include/linux/module.h | 7 +++++++
kernel/module/kallsyms.c | 23 +++++++++++++++++------
kernel/trace/ftrace.c | 15 ++++++++++++++-
4 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index c3f075e8f60c..0bf0d595f244 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -88,6 +88,14 @@ const char *kallsyms_lookup(unsigned long addr,
unsigned long *offset,
char **modname, char *namebuf);
+/*
+ * Check if the name is of a hole symbol
+ */
+static inline int kallsyms_is_hole_symbol(const char *name)
+{
+ return !strcmp(name, "__hole_symbol_XXXXX");
+}
+
/* Look up a kernel symbol and return it in a text buffer. */
extern int sprint_symbol(char *buffer, unsigned long address);
extern int sprint_symbol_build_id(char *buffer, unsigned long address);
@@ -119,6 +127,11 @@ static inline const char *kallsyms_lookup(unsigned long addr,
return NULL;
}
+static inline int kallsyms_is_hole_symbol(const char *name)
+{
+ return 0;
+}
+
static inline int sprint_symbol(char *buffer, unsigned long addr)
{
*buffer = '\0';
diff --git a/include/linux/module.h b/include/linux/module.h
index 13518f464d3f..a3fd077ef2a8 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -960,6 +960,8 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
unsigned long module_kallsyms_lookup_name(const char *name);
unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name);
+int module_kallsyms_find_symbol(struct module *mod, unsigned long addr,
+ unsigned long *size, unsigned long *offset);
#else /* CONFIG_MODULES && CONFIG_KALLSYMS */
@@ -1004,6 +1006,11 @@ static inline unsigned long find_kallsyms_symbol_value(struct module *mod,
return 0;
}
+static inline int module_kallsyms_find_symbol(struct module *mod, unsigned long addr,
+ unsigned long *size, unsigned long *offset)
+{
+ return 0;
+}
#endif /* CONFIG_MODULES && CONFIG_KALLSYMS */
#endif /* _LINUX_MODULE_H */
diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
index 092ae6f43dad..e9c439d81708 100644
--- a/kernel/module/kallsyms.c
+++ b/kernel/module/kallsyms.c
@@ -253,10 +253,10 @@ static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned
* Given a module and address, find the corresponding symbol and return its name
* while providing its size and offset if needed.
*/
-static const char *find_kallsyms_symbol(struct module *mod,
- unsigned long addr,
- unsigned long *size,
- unsigned long *offset)
+static const char *__find_kallsyms_symbol(struct module *mod,
+ unsigned long addr,
+ unsigned long *size,
+ unsigned long *offset)
{
unsigned int i, best = 0;
unsigned long nextval, bestval;
@@ -326,6 +326,17 @@ static const char *find_kallsyms_symbol(struct module *mod,
return kallsyms_symbol_name(kallsyms, best);
}
+int module_kallsyms_find_symbol(struct module *mod, unsigned long addr,
+ unsigned long *size, unsigned long *offset)
+{
+ const char *ret;
+
+ preempt_disable();
+ ret = __find_kallsyms_symbol(mod, addr, size, offset);
+ preempt_enable();
+ return !!ret;
+}
+
void * __weak dereference_module_function_descriptor(struct module *mod,
void *ptr)
{
@@ -359,7 +370,7 @@ const char *module_address_lookup(unsigned long addr,
#endif
}
- ret = find_kallsyms_symbol(mod, addr, size, offset);
+ ret = __find_kallsyms_symbol(mod, addr, size, offset);
}
/* Make a copy in here where it's safe */
if (ret) {
@@ -382,7 +393,7 @@ int lookup_module_symbol_name(unsigned long addr, char *symname)
if (within_module(addr, mod)) {
const char *sym;
- sym = find_kallsyms_symbol(mod, addr, NULL, NULL);
+ sym = __find_kallsyms_symbol(mod, addr, NULL, NULL);
if (!sym)
goto out;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 65208d3b5ed9..0e8628e4d296 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6474,6 +6474,19 @@ static void test_is_sorted(unsigned long *start, unsigned long count)
}
#endif
+static int is_invalid_rec(struct module *mod, unsigned long addr)
+{
+ char str[KSYM_SYMBOL_LEN];
+
+ if (mod)
+ return !module_kallsyms_find_symbol(mod, addr, NULL, NULL);
+
+ if (!kallsyms_lookup(addr, NULL, NULL, NULL, str))
+ return 1;
+ /* record locates in hole is invalid */
+ return kallsyms_is_hole_symbol(str);
+}
+
static int ftrace_process_locs(struct module *mod,
unsigned long *start,
unsigned long *end)
@@ -6545,7 +6558,7 @@ static int ftrace_process_locs(struct module *mod,
* object files to satisfy alignments.
* Skip any NULL pointers.
*/
- if (!addr) {
+ if (!addr || is_invalid_rec(mod, addr)) {
skipped++;
continue;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] ftrace: Fix possible out-of-bound issue in ftrace_process_locs()
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
` (3 preceding siblings ...)
2024-06-13 13:37 ` [PATCH 4/6] ftrace: Skip invalid __fentry__ in ftrace_process_locs() Zheng Yejian
@ 2024-06-13 13:37 ` Zheng Yejian
2024-06-13 13:37 ` [PATCH 6/6] ftrace: Revert the FTRACE_MCOUNT_MAX_OFFSET workaround Zheng Yejian
5 siblings, 0 replies; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
In ftrace_process_locs(), a series pages are prepared and linked in
start_pg, then fentry records are skipped or added, then unused pages
are freed.
However, assume that all records are skipped, currently the start_pg
will still be in list of ftrace_pages_start but without any record.
Then in ftrace_free_mem() index record by (pg->index - 1) will be out
of bound.
To fix this issue, properly handle with unused start_pg and add
WARN_ON_ONCE() where the records need to be indexed.
Fixes: 26efd79c4624 ("ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
kernel/trace/ftrace.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 0e8628e4d296..c46c35ac9b42 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6575,10 +6575,22 @@ static int ftrace_process_locs(struct module *mod,
rec->ip = addr;
}
- if (pg->next) {
+ if (pg->index == 0) {
+ /* No record is added on the current page, so it's unused */
+ pg_unuse = pg;
+ } else if (pg->next) {
+ /* Current page has records, so it's next page is unused */
pg_unuse = pg->next;
pg->next = NULL;
}
+ /*
+ * Even the start_pg hasn't been used, that means, no record has
+ * been added, so restore state of ftrace_pages and just go out.
+ */
+ if (pg_unuse == start_pg) {
+ ftrace_pages->next = NULL;
+ goto out;
+ }
/* Assign the last page to ftrace_pages */
ftrace_pages = pg;
@@ -6794,6 +6806,8 @@ void ftrace_release_mod(struct module *mod)
*/
last_pg = &ftrace_pages_start;
for (pg = ftrace_pages_start; pg; pg = *last_pg) {
+ /* The page should have at lease one record */
+ WARN_ON_ONCE(!pg->index);
rec = &pg->records[0];
if (within_module(rec->ip, mod)) {
/*
@@ -7176,6 +7190,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr)
mod_map = allocate_ftrace_mod_map(mod, start, end);
for (pg = ftrace_pages_start; pg; last_pg = &pg->next, pg = *last_pg) {
+ /* The page should have at lease one record */
+ WARN_ON_ONCE(!pg->index);
if (end < pg->records[0].ip ||
start >= (pg->records[pg->index - 1].ip + MCOUNT_INSN_SIZE))
continue;
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] ftrace: Revert the FTRACE_MCOUNT_MAX_OFFSET workaround
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
` (4 preceding siblings ...)
2024-06-13 13:37 ` [PATCH 5/6] ftrace: Fix possible out-of-bound issue " Zheng Yejian
@ 2024-06-13 13:37 ` Zheng Yejian
5 siblings, 0 replies; 12+ messages in thread
From: Zheng Yejian @ 2024-06-13 13:37 UTC (permalink / raw)
To: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, masahiroy, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz
Cc: zhengyejian1, linux-kernel, linux-trace-kernel, linuxppc-dev,
linux-modules, linux-kbuild, bpf
After patch titled "ftrace: Skip invalid __fentry__ in
ftrace_process_locs()", __fentry__ locations in overridden weak function
have been checked and skipped, then all records in ftrace_pages are
valid, the FTRACE_MCOUNT_MAX_OFFSET workaround can be reverted, include:
1. commit b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid
adding weak function")
2. commit 7af82ff90a2b ("powerpc/ftrace: Ignore weak functions")
3. commit f6834c8c59a8 ("powerpc/ftrace: Fix dropping weak symbols with
older toolchains")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
arch/powerpc/include/asm/ftrace.h | 7 --
arch/x86/include/asm/ftrace.h | 7 --
kernel/trace/ftrace.c | 141 +-----------------------------
3 files changed, 2 insertions(+), 153 deletions(-)
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..d6ed058c8041 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -10,13 +10,6 @@
#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-/* Ignore unused weak functions which will have larger offsets */
-#if defined(CONFIG_MPROFILE_KERNEL) || defined(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)
-#define FTRACE_MCOUNT_MAX_OFFSET 16
-#elif defined(CONFIG_PPC32)
-#define FTRACE_MCOUNT_MAX_OFFSET 8
-#endif
-
#ifndef __ASSEMBLY__
extern void _mcount(void);
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..7a147c9da08d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -9,13 +9,6 @@
# define MCOUNT_ADDR ((unsigned long)(__fentry__))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */
-/* Ignore unused weak functions which will have non zero offsets */
-#ifdef CONFIG_HAVE_FENTRY
-# include <asm/ibt.h>
-/* Add offset for endbr64 if IBT enabled */
-# define FTRACE_MCOUNT_MAX_OFFSET ENDBR_INSN_SIZE
-#endif
-
#ifdef CONFIG_DYNAMIC_FTRACE
#define ARCH_SUPPORTS_FTRACE_OPS 1
#endif
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c46c35ac9b42..1d60dc9a850b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -49,8 +49,6 @@
#define FTRACE_NOCLEAR_FLAGS (FTRACE_FL_DISABLED | FTRACE_FL_TOUCHED | \
FTRACE_FL_MODIFIED)
-#define FTRACE_INVALID_FUNCTION "__ftrace_invalid_address__"
-
#define FTRACE_WARN_ON(cond) \
({ \
int ___r = cond; \
@@ -3709,105 +3707,6 @@ static void add_trampoline_func(struct seq_file *m, struct ftrace_ops *ops,
seq_printf(m, " ->%pS", ptr);
}
-#ifdef FTRACE_MCOUNT_MAX_OFFSET
-/*
- * Weak functions can still have an mcount/fentry that is saved in
- * the __mcount_loc section. These can be detected by having a
- * symbol offset of greater than FTRACE_MCOUNT_MAX_OFFSET, as the
- * symbol found by kallsyms is not the function that the mcount/fentry
- * is part of. The offset is much greater in these cases.
- *
- * Test the record to make sure that the ip points to a valid kallsyms
- * and if not, mark it disabled.
- */
-static int test_for_valid_rec(struct dyn_ftrace *rec)
-{
- char str[KSYM_SYMBOL_LEN];
- unsigned long offset;
- const char *ret;
-
- ret = kallsyms_lookup(rec->ip, NULL, &offset, NULL, str);
-
- /* Weak functions can cause invalid addresses */
- if (!ret || offset > FTRACE_MCOUNT_MAX_OFFSET) {
- rec->flags |= FTRACE_FL_DISABLED;
- return 0;
- }
- return 1;
-}
-
-static struct workqueue_struct *ftrace_check_wq __initdata;
-static struct work_struct ftrace_check_work __initdata;
-
-/*
- * Scan all the mcount/fentry entries to make sure they are valid.
- */
-static __init void ftrace_check_work_func(struct work_struct *work)
-{
- struct ftrace_page *pg;
- struct dyn_ftrace *rec;
-
- mutex_lock(&ftrace_lock);
- do_for_each_ftrace_rec(pg, rec) {
- test_for_valid_rec(rec);
- } while_for_each_ftrace_rec();
- mutex_unlock(&ftrace_lock);
-}
-
-static int __init ftrace_check_for_weak_functions(void)
-{
- INIT_WORK(&ftrace_check_work, ftrace_check_work_func);
-
- ftrace_check_wq = alloc_workqueue("ftrace_check_wq", WQ_UNBOUND, 0);
-
- queue_work(ftrace_check_wq, &ftrace_check_work);
- return 0;
-}
-
-static int __init ftrace_check_sync(void)
-{
- /* Make sure the ftrace_check updates are finished */
- if (ftrace_check_wq)
- destroy_workqueue(ftrace_check_wq);
- return 0;
-}
-
-late_initcall_sync(ftrace_check_sync);
-subsys_initcall(ftrace_check_for_weak_functions);
-
-static int print_rec(struct seq_file *m, unsigned long ip)
-{
- unsigned long offset;
- char str[KSYM_SYMBOL_LEN];
- char *modname;
- const char *ret;
-
- ret = kallsyms_lookup(ip, NULL, &offset, &modname, str);
- /* Weak functions can cause invalid addresses */
- if (!ret || offset > FTRACE_MCOUNT_MAX_OFFSET) {
- snprintf(str, KSYM_SYMBOL_LEN, "%s_%ld",
- FTRACE_INVALID_FUNCTION, offset);
- ret = NULL;
- }
-
- seq_puts(m, str);
- if (modname)
- seq_printf(m, " [%s]", modname);
- return ret == NULL ? -1 : 0;
-}
-#else
-static inline int test_for_valid_rec(struct dyn_ftrace *rec)
-{
- return 1;
-}
-
-static inline int print_rec(struct seq_file *m, unsigned long ip)
-{
- seq_printf(m, "%ps", (void *)ip);
- return 0;
-}
-#endif
-
static int t_show(struct seq_file *m, void *v)
{
struct ftrace_iterator *iter = m->private;
@@ -3835,13 +3734,7 @@ static int t_show(struct seq_file *m, void *v)
if (iter->flags & FTRACE_ITER_ADDRS)
seq_printf(m, "%lx ", rec->ip);
- if (print_rec(m, rec->ip)) {
- /* This should only happen when a rec is disabled */
- WARN_ON_ONCE(!(rec->flags & FTRACE_FL_DISABLED));
- seq_putc(m, '\n');
- return 0;
- }
-
+ seq_printf(m, "%ps", (void *)rec->ip);
if (iter->flags & (FTRACE_ITER_ENABLED | FTRACE_ITER_TOUCHED)) {
struct ftrace_ops *ops;
@@ -4221,24 +4114,6 @@ add_rec_by_index(struct ftrace_hash *hash, struct ftrace_glob *func_g,
return 0;
}
-#ifdef FTRACE_MCOUNT_MAX_OFFSET
-static int lookup_ip(unsigned long ip, char **modname, char *str)
-{
- unsigned long offset;
-
- kallsyms_lookup(ip, NULL, &offset, modname, str);
- if (offset > FTRACE_MCOUNT_MAX_OFFSET)
- return -1;
- return 0;
-}
-#else
-static int lookup_ip(unsigned long ip, char **modname, char *str)
-{
- kallsyms_lookup(ip, NULL, NULL, modname, str);
- return 0;
-}
-#endif
-
static int
ftrace_match_record(struct dyn_ftrace *rec, struct ftrace_glob *func_g,
struct ftrace_glob *mod_g, int exclude_mod)
@@ -4246,12 +4121,7 @@ ftrace_match_record(struct dyn_ftrace *rec, struct ftrace_glob *func_g,
char str[KSYM_SYMBOL_LEN];
char *modname;
- if (lookup_ip(rec->ip, &modname, str)) {
- /* This should only happen when a rec is disabled */
- WARN_ON_ONCE(system_state == SYSTEM_RUNNING &&
- !(rec->flags & FTRACE_FL_DISABLED));
- return 0;
- }
+ kallsyms_lookup(rec->ip, NULL, NULL, &modname, str);
if (mod_g) {
int mod_matches = (modname) ? ftrace_match(modname, mod_g) : 0;
@@ -6887,13 +6757,6 @@ void ftrace_module_enable(struct module *mod)
if (!within_module(rec->ip, mod))
break;
- /* Weak functions should still be ignored */
- if (!test_for_valid_rec(rec)) {
- /* Clear all other flags. Should not be enabled anyway */
- rec->flags = FTRACE_FL_DISABLED;
- continue;
- }
-
cnt = 0;
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc()
2024-06-13 13:37 ` [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc() Zheng Yejian
@ 2024-07-15 20:12 ` Masahiro Yamada
0 siblings, 0 replies; 12+ messages in thread
From: Masahiro Yamada @ 2024-07-15 20:12 UTC (permalink / raw)
To: Zheng Yejian
Cc: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz, linux-kernel,
linux-trace-kernel, linuxppc-dev, linux-modules, linux-kbuild,
bpf
On Thu, Jun 13, 2024 at 10:36 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
>
> Array 'table' is used to store pointers of symbols that read from in.map
> file, and its size depends on the number of symbols. Currently 'table'
> is expanded by calling realloc() every 10000 symbols read.
>
> However, there generally are around 100000+ symbols, which means that
> the expansion is generally 10+ times.
>
> As an optimization, introduce linked list 'sym_list' to associate and
> count all symbols, then store them into 'table' at one time.
>
> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
I do not think this is worthwhile.
realloc() is simple.
If this is a problem, you can increase the
"+= 10000" to "+= 65536" or something.
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] kallsyms: Emit symbol at the holes in the text
2024-06-13 13:37 ` [PATCH 2/6] kallsyms: Emit symbol at the holes in the text Zheng Yejian
@ 2024-07-16 8:33 ` Masahiro Yamada
2024-07-18 3:45 ` Zheng Yejian
0 siblings, 1 reply; 12+ messages in thread
From: Masahiro Yamada @ 2024-07-16 8:33 UTC (permalink / raw)
To: Zheng Yejian
Cc: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz, linux-kernel,
linux-trace-kernel, linuxppc-dev, linux-modules, linux-kbuild,
bpf
On Thu, Jun 13, 2024 at 10:36 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
>
> When a weak type function is overridden, its symbol will be removed
> from the symbol table, but its code will not be removed. Besides,
> due to lacking of size for kallsyms, kernel compute function size by
> substracting its symbol address from its next symbol address (see
> kallsyms_lookup_size_offset()). These will cause that size of some
> function is computed to be larger than it actually is, just because
> symbol of its following weak function is removed.
>
> This issue also causes multiple __fentry__ locations to be counted in
> the some function scope, and eventually causes ftrace_location() to find
> wrong __fentry__ location. It was reported in
> Link: https://lore.kernel.org/all/20240607115211.734845-1-zhengyejian1@huawei.com/
>
> Peter suggested to change scipts/kallsyms.c to emit readily
> identifiable symbol names for all the weak junk, eg:
>
> __weak_junk_NNNNN
>
> The name of this kind symbol needs some discussion, but it's temporarily
> called "__hole_symbol_XXXXX" in this patch:
> 1. Pass size info to scripts/kallsyms (see mksysmap());
> 2. Traverse sorted function symbols, if one function address plus its
> size less than next function address, it means there's a hole, then
> emit a symbol "__hole_symbol_XXXXX" there which type is 't'.
>
> After this patch, the effect is as follows:
>
> $ cat /proc/kallsyms | grep -A 3 do_one_initcall
> ffffffff810021e0 T do_one_initcall
> ffffffff8100245e t __hole_symbol_XXXXX
> ffffffff810024a0 t __pfx_rootfs_init_fs_context
>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
With my quick test, "t__hole_symbol_XXXXX" was encoded
into the following 10-byte stream.
.byte 0x09, 0x94, 0xbf, 0x18, 0xf3, 0x3d, 0xce, 0xd1, 0xd1, 0x58
Now "t__hole_symbol_XXXXX" is the most common symbol name.
However, 10 byte is consumed for every instance of
"t__hole_symbol_XXXXX".
This is much less efficient thanI had expected,
although I did not analyze the logic of this inefficiency.
> ---
> scripts/kallsyms.c | 101 +++++++++++++++++++++++++++++++++++++++-
> scripts/link-vmlinux.sh | 4 +-
> scripts/mksysmap | 2 +-
> 3 files changed, 102 insertions(+), 5 deletions(-)
>
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 6559a9802f6e..5c4cde864a04 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -35,6 +35,7 @@
> struct sym_entry {
> struct sym_entry *next;
> unsigned long long addr;
> + unsigned long long size;
> unsigned int len;
> unsigned int seq;
> unsigned int start_pos;
> @@ -74,6 +75,7 @@ static int token_profit[0x10000];
> static unsigned char best_table[256][2];
> static unsigned char best_table_len[256];
>
> +static const char hole_symbol[] = "__hole_symbol_XXXXX";
>
> static void usage(void)
> {
> @@ -130,8 +132,16 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
> size_t len;
> ssize_t readlen;
> struct sym_entry *sym;
> + unsigned long long size = 0;
>
> errno = 0;
> + /*
> + * Example of expected symbol format:
> + * 1. symbol with size info:
> + * ffffffff81000070 00000000000001d7 T __startup_64
> + * 2. symbol without size info:
> + * 0000000002a00000 A text_size
> + */
> readlen = getline(buf, buf_len, in);
> if (readlen < 0) {
> if (errno) {
> @@ -145,9 +155,24 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
> (*buf)[readlen - 1] = 0;
>
> addr = strtoull(*buf, &p, 16);
> + if (*buf == p || *p++ != ' ') {
> + fprintf(stderr, "line format error: unable to parse address\n");
> + exit(EXIT_FAILURE);
> + }
> +
> + if (*p == '0') {
> + char *str = p;
>
> - if (*buf == p || *p++ != ' ' || !isascii((type = *p++)) || *p++ != ' ') {
> - fprintf(stderr, "line format error\n");
> + size = strtoull(str, &p, 16);
> + if (str == p || *p++ != ' ') {
> + fprintf(stderr, "line format error: unable to parse size\n");
> + exit(EXIT_FAILURE);
> + }
> + }
> +
> + type = *p++;
> + if (!isascii(type) || *p++ != ' ') {
> + fprintf(stderr, "line format error: unable to parse type\n");
> exit(EXIT_FAILURE);
> }
>
> @@ -182,6 +207,7 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
> exit(EXIT_FAILURE);
> }
> sym->addr = addr;
> + sym->size = size;
> sym->len = len;
> sym->sym[0] = type;
> strcpy(sym_name(sym), name);
> @@ -795,6 +821,76 @@ static void sort_symbols(void)
> qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
> }
>
> +static int may_exist_hole_after_symbol(const struct sym_entry *se)
The return type should be bool.
> +{
> + char type = se->sym[0];
> +
> + /* Only check text symbol or weak symbol */
> + if (type != 't' && type != 'T' &&
> + type != 'w' && type != 'W')
> + return 0;
> + /* Symbol without size has no hole */
> + return se->size != 0;
> +}
> +
> +static struct sym_entry *gen_hole_symbol(unsigned long long addr)
> +{
> + struct sym_entry *sym;
> + static size_t len = sizeof(hole_symbol);
> +
> + /* include type field */
> + sym = malloc(sizeof(*sym) + len + 1);
> + if (!sym) {
> + fprintf(stderr, "unable to allocate memory for hole symbol\n");
> + exit(EXIT_FAILURE);
> + }
> + sym->addr = addr;
> + sym->size = 0;
> + sym->len = len;
> + sym->sym[0] = 't';
> + strcpy(sym_name(sym), hole_symbol);
> + sym->percpu_absolute = 0;
> + return sym;
> +}
> +
> +static void emit_hole_symbols(void)
> +{
> + unsigned int i, pos, nr_emit;
> + struct sym_entry **new_table;
> + unsigned int new_cnt;
> +
> + nr_emit = 0;
> + for (i = 0; i < table_cnt - 1; i++) {
> + if (may_exist_hole_after_symbol(table[i]) &&
> + table[i]->addr + table[i]->size < table[i+1]->addr)
> + nr_emit++;
> + }
> + if (!nr_emit)
> + return;
> +
> + new_cnt = table_cnt + nr_emit;
> + new_table = malloc(sizeof(*new_table) * new_cnt);
Do you need to allocate another huge table?
You can use realloc() to append the room for nr_emit
if you iterate the table in the reverse order.
> + if (!new_table) {
> + fprintf(stderr, "unable to allocate memory for new table\n");
> + exit(EXIT_FAILURE);
> + }
> +
> + pos = 0;
> + for (i = 0; i < table_cnt; i++) {
> + unsigned long long addr;
> +
> + new_table[pos++] = table[i];
> + if ((i == table_cnt - 1) || !may_exist_hole_after_symbol(table[i]))
> + continue;
> + addr = table[i]->addr + table[i]->size;
> + if (addr < table[i+1]->addr)
> + new_table[pos++] = gen_hole_symbol(addr);
> + }
> + free(table);
> + table = new_table;
> + table_cnt = new_cnt;
> +}
> +
> static void make_percpus_absolute(void)
> {
> unsigned int i;
> @@ -854,6 +950,7 @@ int main(int argc, char **argv)
> if (absolute_percpu)
> make_percpus_absolute();
> sort_symbols();
> + emit_hole_symbols();
> if (base_relative)
> record_relative_base();
> optimize_token_table();
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 518c70b8db50..8e1373902bfe 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -189,11 +189,11 @@ kallsyms_step()
> }
>
> # Create map file with all symbols from ${1}
> -# See mksymap for additional details
> +# See mksysmap for additional details
> mksysmap()
> {
> info NM ${2}
> - ${NM} -n "${1}" | sed -f "${srctree}/scripts/mksysmap" > "${2}"
> + ${NM} -nS "${1}" | sed -f "${srctree}/scripts/mksysmap" > "${2}"
> }
>
> sorttable()
> diff --git a/scripts/mksysmap b/scripts/mksysmap
> index c12723a04655..7a4415f21143 100755
> --- a/scripts/mksysmap
> +++ b/scripts/mksysmap
> @@ -2,7 +2,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> #
> # sed script to filter out symbols that are not needed for System.map,
> -# or not suitable for kallsyms. The input should be 'nm -n <file>'.
> +# or not suitable for kallsyms. The input should be 'nm -nS <file>'.
> #
> # System.map is used by module-init tools and some debugging
> # tools to retrieve the actual addresses of symbols in the kernel.
> --
> 2.25.1
>
>
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] kallsyms: Emit symbol at the holes in the text
2024-07-16 8:33 ` Masahiro Yamada
@ 2024-07-18 3:45 ` Zheng Yejian
2024-07-20 14:14 ` Masahiro Yamada
0 siblings, 1 reply; 12+ messages in thread
From: Zheng Yejian @ 2024-07-18 3:45 UTC (permalink / raw)
To: Masahiro Yamada
Cc: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz, linux-kernel,
linux-trace-kernel, linuxppc-dev, linux-modules, linux-kbuild,
bpf
On 2024/7/16 16:33, Masahiro Yamada wrote:
> On Thu, Jun 13, 2024 at 10:36 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
>>
>> When a weak type function is overridden, its symbol will be removed
>> from the symbol table, but its code will not be removed. Besides,
>> due to lacking of size for kallsyms, kernel compute function size by
>> substracting its symbol address from its next symbol address (see
>> kallsyms_lookup_size_offset()). These will cause that size of some
>> function is computed to be larger than it actually is, just because
>> symbol of its following weak function is removed.
>>
>> This issue also causes multiple __fentry__ locations to be counted in
>> the some function scope, and eventually causes ftrace_location() to find
>> wrong __fentry__ location. It was reported in
>> Link: https://lore.kernel.org/all/20240607115211.734845-1-zhengyejian1@huawei.com/
>>
>> Peter suggested to change scipts/kallsyms.c to emit readily
>> identifiable symbol names for all the weak junk, eg:
>>
>> __weak_junk_NNNNN
>>
>> The name of this kind symbol needs some discussion, but it's temporarily
>> called "__hole_symbol_XXXXX" in this patch:
>> 1. Pass size info to scripts/kallsyms (see mksysmap());
>> 2. Traverse sorted function symbols, if one function address plus its
>> size less than next function address, it means there's a hole, then
>> emit a symbol "__hole_symbol_XXXXX" there which type is 't'.
>>
>> After this patch, the effect is as follows:
>>
>> $ cat /proc/kallsyms | grep -A 3 do_one_initcall
>> ffffffff810021e0 T do_one_initcall
>> ffffffff8100245e t __hole_symbol_XXXXX
>> ffffffff810024a0 t __pfx_rootfs_init_fs_context
>>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
>
>
>
> With my quick test, "t__hole_symbol_XXXXX" was encoded
> into the following 10-byte stream.
>
> .byte 0x09, 0x94, 0xbf, 0x18, 0xf3, 0x3d, 0xce, 0xd1, 0xd1, 0x58
>
>
>
> Now "t__hole_symbol_XXXXX" is the most common symbol name.
> However, 10 byte is consumed for every instance of
> "t__hole_symbol_XXXXX".
>
> This is much less efficient thanI had expected,
> although I did not analyze the logic of this inefficiency.
>
Hi, Masahiro!
In my local test, "t__hole_symbol_XXXXX" was finally encoded
into just one byte. See "kallsyms_token_table" in the .tmp_vmlinux.kallsyms2.S:
kallsyms_token_table:
[...]
.asciz "t__hole_symbol_XXXXX"
.asciz "hole_symbol_XXXXX"
.asciz "e_symbol_XXXXX"
.asciz "XXXXX"
.asciz "XXX"
.asciz "e_symbol_"
.asciz "ymbol_"
.asciz "ymb"
.asciz "hol"
.asciz "ol_"
.asciz "pfx"
.asciz "pf"
.asciz "e_s"
.asciz "ym"
.asciz "t__"
.asciz "_s"
.asciz "ol"
.asciz "__"
.asciz "XX"
But it would still takes up several tokens due to substrings of
"t__hole_symbol_XXXXX" would also become the most common ones.
After this patch, the number of "t__hole_symbol_XXXXX" will be ~30% of the total.
>
>
>
>
>
>
>> ---
>> scripts/kallsyms.c | 101 +++++++++++++++++++++++++++++++++++++++-
>> scripts/link-vmlinux.sh | 4 +-
>> scripts/mksysmap | 2 +-
>> 3 files changed, 102 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>> index 6559a9802f6e..5c4cde864a04 100644
>> --- a/scripts/kallsyms.c
>> +++ b/scripts/kallsyms.c
>> @@ -35,6 +35,7 @@
>> struct sym_entry {
>> struct sym_entry *next;
>> unsigned long long addr;
>> + unsigned long long size;
>> unsigned int len;
>> unsigned int seq;
>> unsigned int start_pos;
>> @@ -74,6 +75,7 @@ static int token_profit[0x10000];
>> static unsigned char best_table[256][2];
>> static unsigned char best_table_len[256];
>>
>> +static const char hole_symbol[] = "__hole_symbol_XXXXX";
>>
>> static void usage(void)
>> {
>> @@ -130,8 +132,16 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
>> size_t len;
>> ssize_t readlen;
>> struct sym_entry *sym;
>> + unsigned long long size = 0;
>>
>> errno = 0;
>> + /*
>> + * Example of expected symbol format:
>> + * 1. symbol with size info:
>> + * ffffffff81000070 00000000000001d7 T __startup_64
>> + * 2. symbol without size info:
>> + * 0000000002a00000 A text_size
>> + */
>> readlen = getline(buf, buf_len, in);
>> if (readlen < 0) {
>> if (errno) {
>> @@ -145,9 +155,24 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
>> (*buf)[readlen - 1] = 0;
>>
>> addr = strtoull(*buf, &p, 16);
>> + if (*buf == p || *p++ != ' ') {
>> + fprintf(stderr, "line format error: unable to parse address\n");
>> + exit(EXIT_FAILURE);
>> + }
>> +
>> + if (*p == '0') {
>> + char *str = p;
>>
>> - if (*buf == p || *p++ != ' ' || !isascii((type = *p++)) || *p++ != ' ') {
>> - fprintf(stderr, "line format error\n");
>> + size = strtoull(str, &p, 16);
>> + if (str == p || *p++ != ' ') {
>> + fprintf(stderr, "line format error: unable to parse size\n");
>> + exit(EXIT_FAILURE);
>> + }
>> + }
>> +
>> + type = *p++;
>> + if (!isascii(type) || *p++ != ' ') {
>> + fprintf(stderr, "line format error: unable to parse type\n");
>> exit(EXIT_FAILURE);
>> }
>>
>> @@ -182,6 +207,7 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
>> exit(EXIT_FAILURE);
>> }
>> sym->addr = addr;
>> + sym->size = size;
>> sym->len = len;
>> sym->sym[0] = type;
>> strcpy(sym_name(sym), name);
>> @@ -795,6 +821,76 @@ static void sort_symbols(void)
>> qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
>> }
>>
>> +static int may_exist_hole_after_symbol(const struct sym_entry *se)
>
>
> The return type should be bool.
>
Yes!
>
>
>> +{
>> + char type = se->sym[0];
>> +
>> + /* Only check text symbol or weak symbol */
>> + if (type != 't' && type != 'T' &&
>> + type != 'w' && type != 'W')
>> + return 0;
>> + /* Symbol without size has no hole */
>> + return se->size != 0;
>> +}
>> +
>> +static struct sym_entry *gen_hole_symbol(unsigned long long addr)
>> +{
>> + struct sym_entry *sym;
>> + static size_t len = sizeof(hole_symbol);
>> +
>> + /* include type field */
>> + sym = malloc(sizeof(*sym) + len + 1);
>> + if (!sym) {
>> + fprintf(stderr, "unable to allocate memory for hole symbol\n");
>> + exit(EXIT_FAILURE);
>> + }
>> + sym->addr = addr;
>> + sym->size = 0;
>> + sym->len = len;
>> + sym->sym[0] = 't';
>> + strcpy(sym_name(sym), hole_symbol);
>> + sym->percpu_absolute = 0;
>> + return sym;
>> +}
>> +
>> +static void emit_hole_symbols(void)
>> +{
>> + unsigned int i, pos, nr_emit;
>> + struct sym_entry **new_table;
>> + unsigned int new_cnt;
>> +
>> + nr_emit = 0;
>> + for (i = 0; i < table_cnt - 1; i++) {
>> + if (may_exist_hole_after_symbol(table[i]) &&
>> + table[i]->addr + table[i]->size < table[i+1]->addr)
>> + nr_emit++;
>> + }
>> + if (!nr_emit)
>> + return;
>> +
>> + new_cnt = table_cnt + nr_emit;
>> + new_table = malloc(sizeof(*new_table) * new_cnt);
>
>
> Do you need to allocate another huge table?
>
> You can use realloc() to append the room for nr_emit
> if you iterate the table in the reverse order.
>
Yes, it would be much better. If it turns out to be the
"emit hole symbol" solution, I'll change it to that in the next version,
actually, I forgot to mark this series as "RFC".
>
>
>
>
>
>
>> + if (!new_table) {
>> + fprintf(stderr, "unable to allocate memory for new table\n");
>> + exit(EXIT_FAILURE);
>> + }
>> +
>> + pos = 0;
>> + for (i = 0; i < table_cnt; i++) {
>> + unsigned long long addr;
>> +
>> + new_table[pos++] = table[i];
>> + if ((i == table_cnt - 1) || !may_exist_hole_after_symbol(table[i]))
>> + continue;
>> + addr = table[i]->addr + table[i]->size;
>> + if (addr < table[i+1]->addr)
>> + new_table[pos++] = gen_hole_symbol(addr);
>> + }
>> + free(table);
>> + table = new_table;
>> + table_cnt = new_cnt;
>> +}
>> +
>> static void make_percpus_absolute(void)
>> {
>> unsigned int i;
>> @@ -854,6 +950,7 @@ int main(int argc, char **argv)
>> if (absolute_percpu)
>> make_percpus_absolute();
>> sort_symbols();
>> + emit_hole_symbols();
>> if (base_relative)
>> record_relative_base();
>> optimize_token_table();
>> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
>> index 518c70b8db50..8e1373902bfe 100755
>> --- a/scripts/link-vmlinux.sh
>> +++ b/scripts/link-vmlinux.sh
>> @@ -189,11 +189,11 @@ kallsyms_step()
>> }
>>
>> # Create map file with all symbols from ${1}
>> -# See mksymap for additional details
>> +# See mksysmap for additional details
>> mksysmap()
>> {
>> info NM ${2}
>> - ${NM} -n "${1}" | sed -f "${srctree}/scripts/mksysmap" > "${2}"
>> + ${NM} -nS "${1}" | sed -f "${srctree}/scripts/mksysmap" > "${2}"
>> }
>>
>> sorttable()
>> diff --git a/scripts/mksysmap b/scripts/mksysmap
>> index c12723a04655..7a4415f21143 100755
>> --- a/scripts/mksysmap
>> +++ b/scripts/mksysmap
>> @@ -2,7 +2,7 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>> #
>> # sed script to filter out symbols that are not needed for System.map,
>> -# or not suitable for kallsyms. The input should be 'nm -n <file>'.
>> +# or not suitable for kallsyms. The input should be 'nm -nS <file>'.
>> #
>> # System.map is used by module-init tools and some debugging
>> # tools to retrieve the actual addresses of symbols in the kernel.
>> --
>> 2.25.1
>>
>>
>
>
--
Thanks,
Zheng Yejian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] kallsyms: Emit symbol at the holes in the text
2024-07-18 3:45 ` Zheng Yejian
@ 2024-07-20 14:14 ` Masahiro Yamada
2024-07-22 1:29 ` Zheng Yejian
0 siblings, 1 reply; 12+ messages in thread
From: Masahiro Yamada @ 2024-07-20 14:14 UTC (permalink / raw)
To: Zheng Yejian
Cc: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz, linux-kernel,
linux-trace-kernel, linuxppc-dev, linux-modules, linux-kbuild,
bpf
On Thu, Jul 18, 2024 at 12:45 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
>
> On 2024/7/16 16:33, Masahiro Yamada wrote:
> > On Thu, Jun 13, 2024 at 10:36 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
> >>
> >> When a weak type function is overridden, its symbol will be removed
> >> from the symbol table, but its code will not be removed. Besides,
> >> due to lacking of size for kallsyms, kernel compute function size by
> >> substracting its symbol address from its next symbol address (see
> >> kallsyms_lookup_size_offset()). These will cause that size of some
> >> function is computed to be larger than it actually is, just because
> >> symbol of its following weak function is removed.
> >>
> >> This issue also causes multiple __fentry__ locations to be counted in
> >> the some function scope, and eventually causes ftrace_location() to find
> >> wrong __fentry__ location. It was reported in
> >> Link: https://lore.kernel.org/all/20240607115211.734845-1-zhengyejian1@huawei.com/
> >>
> >> Peter suggested to change scipts/kallsyms.c to emit readily
> >> identifiable symbol names for all the weak junk, eg:
> >>
> >> __weak_junk_NNNNN
> >>
> >> The name of this kind symbol needs some discussion, but it's temporarily
> >> called "__hole_symbol_XXXXX" in this patch:
> >> 1. Pass size info to scripts/kallsyms (see mksysmap());
> >> 2. Traverse sorted function symbols, if one function address plus its
> >> size less than next function address, it means there's a hole, then
> >> emit a symbol "__hole_symbol_XXXXX" there which type is 't'.
> >>
> >> After this patch, the effect is as follows:
> >>
> >> $ cat /proc/kallsyms | grep -A 3 do_one_initcall
> >> ffffffff810021e0 T do_one_initcall
> >> ffffffff8100245e t __hole_symbol_XXXXX
> >> ffffffff810024a0 t __pfx_rootfs_init_fs_context
> >>
> >> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> >> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
> >
> >
> >
> > With my quick test, "t__hole_symbol_XXXXX" was encoded
> > into the following 10-byte stream.
> >
> > .byte 0x09, 0x94, 0xbf, 0x18, 0xf3, 0x3d, 0xce, 0xd1, 0xd1, 0x58
> >
> >
> >
> > Now "t__hole_symbol_XXXXX" is the most common symbol name.
> > However, 10 byte is consumed for every instance of
> > "t__hole_symbol_XXXXX".
> >
> > This is much less efficient thanI had expected,
> > although I did not analyze the logic of this inefficiency.
> >
> Hi, Masahiro!
>
> In my local test, "t__hole_symbol_XXXXX" was finally encoded
> into just one byte. See "kallsyms_token_table" in the .tmp_vmlinux.kallsyms2.S:
>
> kallsyms_token_table:
> [...]
> .asciz "t__hole_symbol_XXXXX"
> .asciz "hole_symbol_XXXXX"
> .asciz "e_symbol_XXXXX"
> .asciz "XXXXX"
> .asciz "XXX"
> .asciz "e_symbol_"
> .asciz "ymbol_"
> .asciz "ymb"
> .asciz "hol"
> .asciz "ol_"
> .asciz "pfx"
> .asciz "pf"
> .asciz "e_s"
> .asciz "ym"
> .asciz "t__"
> .asciz "_s"
> .asciz "ol"
> .asciz "__"
> .asciz "XX"
>
> But it would still takes up several tokens due to substrings of
> "t__hole_symbol_XXXXX" would also become the most common ones.
> After this patch, the number of "t__hole_symbol_XXXXX" will be ~30% of the total.
>
> >
> >
> >
> >
> >
> >
> >> ---
> >> scripts/kallsyms.c | 101 +++++++++++++++++++++++++++++++++++++++-
> >> scripts/link-vmlinux.sh | 4 +-
> >> scripts/mksysmap | 2 +-
> >> 3 files changed, 102 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> >> index 6559a9802f6e..5c4cde864a04 100644
> >> --- a/scripts/kallsyms.c
> >> +++ b/scripts/kallsyms.c
> >> @@ -35,6 +35,7 @@
> >> struct sym_entry {
> >> struct sym_entry *next;
> >> unsigned long long addr;
> >> + unsigned long long size;
> >> unsigned int len;
> >> unsigned int seq;
> >> unsigned int start_pos;
> >> @@ -74,6 +75,7 @@ static int token_profit[0x10000];
> >> static unsigned char best_table[256][2];
> >> static unsigned char best_table_len[256];
> >>
> >> +static const char hole_symbol[] = "__hole_symbol_XXXXX";
> >>
> >> static void usage(void)
> >> {
> >> @@ -130,8 +132,16 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
> >> size_t len;
> >> ssize_t readlen;
> >> struct sym_entry *sym;
> >> + unsigned long long size = 0;
> >>
> >> errno = 0;
> >> + /*
> >> + * Example of expected symbol format:
> >> + * 1. symbol with size info:
> >> + * ffffffff81000070 00000000000001d7 T __startup_64
> >> + * 2. symbol without size info:
> >> + * 0000000002a00000 A text_size
> >> + */
> >> readlen = getline(buf, buf_len, in);
> >> if (readlen < 0) {
> >> if (errno) {
> >> @@ -145,9 +155,24 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
> >> (*buf)[readlen - 1] = 0;
> >>
> >> addr = strtoull(*buf, &p, 16);
> >> + if (*buf == p || *p++ != ' ') {
> >> + fprintf(stderr, "line format error: unable to parse address\n");
> >> + exit(EXIT_FAILURE);
> >> + }
> >> +
> >> + if (*p == '0') {
> >> + char *str = p;
> >>
> >> - if (*buf == p || *p++ != ' ' || !isascii((type = *p++)) || *p++ != ' ') {
> >> - fprintf(stderr, "line format error\n");
> >> + size = strtoull(str, &p, 16);
> >> + if (str == p || *p++ != ' ') {
> >> + fprintf(stderr, "line format error: unable to parse size\n");
> >> + exit(EXIT_FAILURE);
> >> + }
> >> + }
> >> +
> >> + type = *p++;
> >> + if (!isascii(type) || *p++ != ' ') {
> >> + fprintf(stderr, "line format error: unable to parse type\n");
> >> exit(EXIT_FAILURE);
> >> }
> >>
> >> @@ -182,6 +207,7 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
> >> exit(EXIT_FAILURE);
> >> }
> >> sym->addr = addr;
> >> + sym->size = size;
> >> sym->len = len;
> >> sym->sym[0] = type;
> >> strcpy(sym_name(sym), name);
> >> @@ -795,6 +821,76 @@ static void sort_symbols(void)
> >> qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
> >> }
> >>
> >> +static int may_exist_hole_after_symbol(const struct sym_entry *se)
> >
> >
> > The return type should be bool.
> >
>
> Yes!
>
> >
> >
> >> +{
> >> + char type = se->sym[0];
> >> +
> >> + /* Only check text symbol or weak symbol */
> >> + if (type != 't' && type != 'T' &&
> >> + type != 'w' && type != 'W')
> >> + return 0;
> >> + /* Symbol without size has no hole */
> >> + return se->size != 0;
> >> +}
> >> +
> >> +static struct sym_entry *gen_hole_symbol(unsigned long long addr)
> >> +{
> >> + struct sym_entry *sym;
> >> + static size_t len = sizeof(hole_symbol);
> >> +
> >> + /* include type field */
> >> + sym = malloc(sizeof(*sym) + len + 1);
> >> + if (!sym) {
> >> + fprintf(stderr, "unable to allocate memory for hole symbol\n");
> >> + exit(EXIT_FAILURE);
> >> + }
> >> + sym->addr = addr;
> >> + sym->size = 0;
> >> + sym->len = len;
> >> + sym->sym[0] = 't';
> >> + strcpy(sym_name(sym), hole_symbol);
> >> + sym->percpu_absolute = 0;
> >> + return sym;
> >> +}
> >> +
> >> +static void emit_hole_symbols(void)
> >> +{
> >> + unsigned int i, pos, nr_emit;
> >> + struct sym_entry **new_table;
> >> + unsigned int new_cnt;
> >> +
> >> + nr_emit = 0;
> >> + for (i = 0; i < table_cnt - 1; i++) {
> >> + if (may_exist_hole_after_symbol(table[i]) &&
> >> + table[i]->addr + table[i]->size < table[i+1]->addr)
> >> + nr_emit++;
> >> + }
> >> + if (!nr_emit)
> >> + return;
> >> +
> >> + new_cnt = table_cnt + nr_emit;
> >> + new_table = malloc(sizeof(*new_table) * new_cnt);
> >
> >
> > Do you need to allocate another huge table?
> >
> > You can use realloc() to append the room for nr_emit
> > if you iterate the table in the reverse order.
> >
>
> Yes, it would be much better. If it turns out to be the
> "emit hole symbol" solution, I'll change it to that in the next version,
> actually, I forgot to mark this series as "RFC".
"__hole_symbol_XXXXX" is too much.
You can use the empty symbol type/name as a special case
to represent the hole.
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] kallsyms: Emit symbol at the holes in the text
2024-07-20 14:14 ` Masahiro Yamada
@ 2024-07-22 1:29 ` Zheng Yejian
0 siblings, 0 replies; 12+ messages in thread
From: Zheng Yejian @ 2024-07-22 1:29 UTC (permalink / raw)
To: Masahiro Yamada
Cc: rostedt, mhiramat, mark.rutland, mpe, npiggin, christophe.leroy,
naveen.n.rao, tglx, mingo, bp, dave.hansen, x86, hpa, mcgrof,
mathieu.desnoyers, nathan, nicolas, kees, james.clark,
kent.overstreet, yhs, jpoimboe, peterz, linux-kernel,
linux-trace-kernel, linuxppc-dev, linux-modules, linux-kbuild,
bpf
On 2024/7/20 22:14, Masahiro Yamada wrote:
> On Thu, Jul 18, 2024 at 12:45 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
>>
>> On 2024/7/16 16:33, Masahiro Yamada wrote:
>>> On Thu, Jun 13, 2024 at 10:36 PM Zheng Yejian <zhengyejian1@huawei.com> wrote:
>>>>
>>>> When a weak type function is overridden, its symbol will be removed
>>>> from the symbol table, but its code will not be removed. Besides,
>>>> due to lacking of size for kallsyms, kernel compute function size by
>>>> substracting its symbol address from its next symbol address (see
>>>> kallsyms_lookup_size_offset()). These will cause that size of some
>>>> function is computed to be larger than it actually is, just because
>>>> symbol of its following weak function is removed.
>>>>
>>>> This issue also causes multiple __fentry__ locations to be counted in
>>>> the some function scope, and eventually causes ftrace_location() to find
>>>> wrong __fentry__ location. It was reported in
>>>> Link: https://lore.kernel.org/all/20240607115211.734845-1-zhengyejian1@huawei.com/
>>>>
>>>> Peter suggested to change scipts/kallsyms.c to emit readily
>>>> identifiable symbol names for all the weak junk, eg:
>>>>
>>>> __weak_junk_NNNNN
>>>>
>>>> The name of this kind symbol needs some discussion, but it's temporarily
>>>> called "__hole_symbol_XXXXX" in this patch:
>>>> 1. Pass size info to scripts/kallsyms (see mksysmap());
>>>> 2. Traverse sorted function symbols, if one function address plus its
>>>> size less than next function address, it means there's a hole, then
>>>> emit a symbol "__hole_symbol_XXXXX" there which type is 't'.
>>>>
>>>> After this patch, the effect is as follows:
>>>>
>>>> $ cat /proc/kallsyms | grep -A 3 do_one_initcall
>>>> ffffffff810021e0 T do_one_initcall
>>>> ffffffff8100245e t __hole_symbol_XXXXX
>>>> ffffffff810024a0 t __pfx_rootfs_init_fs_context
>>>>
>>>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>>>> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
>>>
>>>
>>>
>>> With my quick test, "t__hole_symbol_XXXXX" was encoded
>>> into the following 10-byte stream.
>>>
>>> .byte 0x09, 0x94, 0xbf, 0x18, 0xf3, 0x3d, 0xce, 0xd1, 0xd1, 0x58
>>>
>>>
>>>
>>> Now "t__hole_symbol_XXXXX" is the most common symbol name.
>>> However, 10 byte is consumed for every instance of
>>> "t__hole_symbol_XXXXX".
>>>
>>> This is much less efficient thanI had expected,
>>> although I did not analyze the logic of this inefficiency.
>>>
>> Hi, Masahiro!
>>
>> In my local test, "t__hole_symbol_XXXXX" was finally encoded
>> into just one byte. See "kallsyms_token_table" in the .tmp_vmlinux.kallsyms2.S:
>>
>> kallsyms_token_table:
>> [...]
>> .asciz "t__hole_symbol_XXXXX"
>> .asciz "hole_symbol_XXXXX"
>> .asciz "e_symbol_XXXXX"
>> .asciz "XXXXX"
>> .asciz "XXX"
>> .asciz "e_symbol_"
>> .asciz "ymbol_"
>> .asciz "ymb"
>> .asciz "hol"
>> .asciz "ol_"
>> .asciz "pfx"
>> .asciz "pf"
>> .asciz "e_s"
>> .asciz "ym"
>> .asciz "t__"
>> .asciz "_s"
>> .asciz "ol"
>> .asciz "__"
>> .asciz "XX"
>>
>> But it would still takes up several tokens due to substrings of
>> "t__hole_symbol_XXXXX" would also become the most common ones.
>> After this patch, the number of "t__hole_symbol_XXXXX" will be ~30% of the total.
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> ---
>>>> scripts/kallsyms.c | 101 +++++++++++++++++++++++++++++++++++++++-
>>>> scripts/link-vmlinux.sh | 4 +-
>>>> scripts/mksysmap | 2 +-
>>>> 3 files changed, 102 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>>>> index 6559a9802f6e..5c4cde864a04 100644
>>>> --- a/scripts/kallsyms.c
>>>> +++ b/scripts/kallsyms.c
>>>> @@ -35,6 +35,7 @@
>>>> struct sym_entry {
>>>> struct sym_entry *next;
>>>> unsigned long long addr;
>>>> + unsigned long long size;
>>>> unsigned int len;
>>>> unsigned int seq;
>>>> unsigned int start_pos;
>>>> @@ -74,6 +75,7 @@ static int token_profit[0x10000];
>>>> static unsigned char best_table[256][2];
>>>> static unsigned char best_table_len[256];
>>>>
>>>> +static const char hole_symbol[] = "__hole_symbol_XXXXX";
>>>>
>>>> static void usage(void)
>>>> {
>>>> @@ -130,8 +132,16 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
>>>> size_t len;
>>>> ssize_t readlen;
>>>> struct sym_entry *sym;
>>>> + unsigned long long size = 0;
>>>>
>>>> errno = 0;
>>>> + /*
>>>> + * Example of expected symbol format:
>>>> + * 1. symbol with size info:
>>>> + * ffffffff81000070 00000000000001d7 T __startup_64
>>>> + * 2. symbol without size info:
>>>> + * 0000000002a00000 A text_size
>>>> + */
>>>> readlen = getline(buf, buf_len, in);
>>>> if (readlen < 0) {
>>>> if (errno) {
>>>> @@ -145,9 +155,24 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
>>>> (*buf)[readlen - 1] = 0;
>>>>
>>>> addr = strtoull(*buf, &p, 16);
>>>> + if (*buf == p || *p++ != ' ') {
>>>> + fprintf(stderr, "line format error: unable to parse address\n");
>>>> + exit(EXIT_FAILURE);
>>>> + }
>>>> +
>>>> + if (*p == '0') {
>>>> + char *str = p;
>>>>
>>>> - if (*buf == p || *p++ != ' ' || !isascii((type = *p++)) || *p++ != ' ') {
>>>> - fprintf(stderr, "line format error\n");
>>>> + size = strtoull(str, &p, 16);
>>>> + if (str == p || *p++ != ' ') {
>>>> + fprintf(stderr, "line format error: unable to parse size\n");
>>>> + exit(EXIT_FAILURE);
>>>> + }
>>>> + }
>>>> +
>>>> + type = *p++;
>>>> + if (!isascii(type) || *p++ != ' ') {
>>>> + fprintf(stderr, "line format error: unable to parse type\n");
>>>> exit(EXIT_FAILURE);
>>>> }
>>>>
>>>> @@ -182,6 +207,7 @@ static struct sym_entry *read_symbol(FILE *in, char **buf, size_t *buf_len)
>>>> exit(EXIT_FAILURE);
>>>> }
>>>> sym->addr = addr;
>>>> + sym->size = size;
>>>> sym->len = len;
>>>> sym->sym[0] = type;
>>>> strcpy(sym_name(sym), name);
>>>> @@ -795,6 +821,76 @@ static void sort_symbols(void)
>>>> qsort(table, table_cnt, sizeof(table[0]), compare_symbols);
>>>> }
>>>>
>>>> +static int may_exist_hole_after_symbol(const struct sym_entry *se)
>>>
>>>
>>> The return type should be bool.
>>>
>>
>> Yes!
>>
>>>
>>>
>>>> +{
>>>> + char type = se->sym[0];
>>>> +
>>>> + /* Only check text symbol or weak symbol */
>>>> + if (type != 't' && type != 'T' &&
>>>> + type != 'w' && type != 'W')
>>>> + return 0;
>>>> + /* Symbol without size has no hole */
>>>> + return se->size != 0;
>>>> +}
>>>> +
>>>> +static struct sym_entry *gen_hole_symbol(unsigned long long addr)
>>>> +{
>>>> + struct sym_entry *sym;
>>>> + static size_t len = sizeof(hole_symbol);
>>>> +
>>>> + /* include type field */
>>>> + sym = malloc(sizeof(*sym) + len + 1);
>>>> + if (!sym) {
>>>> + fprintf(stderr, "unable to allocate memory for hole symbol\n");
>>>> + exit(EXIT_FAILURE);
>>>> + }
>>>> + sym->addr = addr;
>>>> + sym->size = 0;
>>>> + sym->len = len;
>>>> + sym->sym[0] = 't';
>>>> + strcpy(sym_name(sym), hole_symbol);
>>>> + sym->percpu_absolute = 0;
>>>> + return sym;
>>>> +}
>>>> +
>>>> +static void emit_hole_symbols(void)
>>>> +{
>>>> + unsigned int i, pos, nr_emit;
>>>> + struct sym_entry **new_table;
>>>> + unsigned int new_cnt;
>>>> +
>>>> + nr_emit = 0;
>>>> + for (i = 0; i < table_cnt - 1; i++) {
>>>> + if (may_exist_hole_after_symbol(table[i]) &&
>>>> + table[i]->addr + table[i]->size < table[i+1]->addr)
>>>> + nr_emit++;
>>>> + }
>>>> + if (!nr_emit)
>>>> + return;
>>>> +
>>>> + new_cnt = table_cnt + nr_emit;
>>>> + new_table = malloc(sizeof(*new_table) * new_cnt);
>>>
>>>
>>> Do you need to allocate another huge table?
>>>
>>> You can use realloc() to append the room for nr_emit
>>> if you iterate the table in the reverse order.
>>>
>>
>> Yes, it would be much better. If it turns out to be the
>> "emit hole symbol" solution, I'll change it to that in the next version,
>> actually, I forgot to mark this series as "RFC".
>
>
> "__hole_symbol_XXXXX" is too much.
>
> You can use the empty symbol type/name as a special case
> to represent the hole.
>
Ok, I'll try it in v2.
>
--
Thanks,
Zheng Yejian
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-07-22 1:49 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 13:37 [PATCH 0/6] kallsyms: Emit symbol for holes in text and fix weak function issue Zheng Yejian
2024-06-13 13:37 ` [PATCH 1/6] kallsyms: Optimize multiple times of realloc() to one time of malloc() Zheng Yejian
2024-07-15 20:12 ` Masahiro Yamada
2024-06-13 13:37 ` [PATCH 2/6] kallsyms: Emit symbol at the holes in the text Zheng Yejian
2024-07-16 8:33 ` Masahiro Yamada
2024-07-18 3:45 ` Zheng Yejian
2024-07-20 14:14 ` Masahiro Yamada
2024-07-22 1:29 ` Zheng Yejian
2024-06-13 13:37 ` [PATCH 3/6] module: kallsyms: Determine exact function size Zheng Yejian
2024-06-13 13:37 ` [PATCH 4/6] ftrace: Skip invalid __fentry__ in ftrace_process_locs() Zheng Yejian
2024-06-13 13:37 ` [PATCH 5/6] ftrace: Fix possible out-of-bound issue " Zheng Yejian
2024-06-13 13:37 ` [PATCH 6/6] ftrace: Revert the FTRACE_MCOUNT_MAX_OFFSET workaround Zheng Yejian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).