public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Pavlu <petr.pavlu@suse.com>
To: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: linux-modules@vger.kernel.org,
	Sami Tolvanen <samitolvanen@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	live-patching@vger.kernel.org, Daniel Gomez <da.gomez@kernel.org>,
	Aaron Tomlin <atomlin@atomlin.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Jordan Rome <linux@jordanrome.com>,
	Viktor Malik <vmalik@redhat.com>
Subject: Re: [PATCH v2 2/2] module/kallsyms: sort function symbols and use binary search
Date: Thu, 23 Apr 2026 16:00:04 +0200	[thread overview]
Message-ID: <11c8e139-f9f3-4b22-863a-4e021a3947e7@suse.com> (raw)
In-Reply-To: <20260327110005.16499-2-stf_xl@wp.pl>

On 3/27/26 12:00 PM, Stanislaw Gruszka wrote:
> Module symbol lookup via find_kallsyms_symbol() performs a linear scan
> over the entire symtab when resolving an address. The number of symbols
> in module symtabs has grown over the years, largely due to additional
> metadata in non-standard sections, making this lookup very slow.
> 
> Improve this by separating function symbols during module load, placing
> them at the beginning of the symtab, sorting them by address, and using
> binary search when resolving addresses in module text.
> 
> This also should improve times for linear symbol name lookups, as valid
> function symbols are now located at the beginning of the symtab.
> 
> The cost of sorting is small relative to module load time. In repeated
> module load tests [1], depending on .config options, this change
> increases load time between 2% and 4%. With cold caches, the difference
> is not measurable, as memory access latency dominates.
> 
> The sorting theoretically could be done in compile time, but much more
> complicated as we would have to simulate kernel addresses resolution
> for symbols, and then correct relocation entries. That would be risky
> if get out of sync.
> 
> The improvement can be observed when listing ftrace filter functions.
> 
> Before:
> 
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74908
> 
> real	0m1.315s
> user	0m0.000s
> sys	0m1.312s
> 
> After:
> 
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74911
> 
> real	0m0.167s
> user	0m0.004s
> sys	0m0.175s
> 
> (there are three more symbols introduced by the patch)
> 
> For livepatch modules, the symtab layout is preserved and the existing
> linear search is used. For this case, it should be possible to keep
> the original ELF symtab instead of copying it 1:1, but that is outside
> the scope of this patch.
> 
> Link: https://gist.github.com/sgruszka/09f3fb1dad53a97b1aad96e1927ab117 [1]
> Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>

Sorry for the delay reviewing this patch.

> ---
> v1 -> v2: 
>  - fix searching data symbols for CONFIG_KALLSYMS_ALL
>  - use kallsyms_symbol_value() in elf_sym_cmp()
> 
>  include/linux/module.h   |   1 +
>  kernel/module/internal.h |   1 +
>  kernel/module/kallsyms.c | 171 +++++++++++++++++++++++++++++----------
>  3 files changed, 130 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index ac254525014c..67c053afa882 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -379,6 +379,7 @@ struct module_memory {
>  struct mod_kallsyms {
>  	Elf_Sym *symtab;
>  	unsigned int num_symtab;
> +	unsigned int num_func_syms;
>  	char *strtab;
>  	char *typetab;
>  };
> diff --git a/kernel/module/internal.h b/kernel/module/internal.h
> index 618202578b42..6a4d498619b1 100644
> --- a/kernel/module/internal.h
> +++ b/kernel/module/internal.h
> @@ -73,6 +73,7 @@ struct load_info {
>  	bool sig_ok;
>  #ifdef CONFIG_KALLSYMS
>  	unsigned long mod_kallsyms_init_off;
> +	unsigned long num_func_syms;
>  #endif
>  #ifdef CONFIG_MODULE_DECOMPRESS
>  #ifdef CONFIG_MODULE_STATS
> diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
> index f23126d804b2..d69e99e67707 100644
> --- a/kernel/module/kallsyms.c
> +++ b/kernel/module/kallsyms.c
> @@ -10,6 +10,7 @@
>  #include <linux/kallsyms.h>
>  #include <linux/buildid.h>
>  #include <linux/bsearch.h>
> +#include <linux/sort.h>
>  #include "internal.h"
>  
>  /* Lookup exported symbol in given range of kernel_symbols */
> @@ -103,6 +104,95 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
>  	return true;
>  }
>  
> +static inline bool is_func_symbol(const Elf_Sym *sym)
> +{
> +	return sym->st_shndx != SHN_UNDEF && sym->st_size != 0 &&
> +	       ELF_ST_TYPE(sym->st_info) == STT_FUNC;
> +}
> +
> +static unsigned int bsearch_func_symbol(struct mod_kallsyms *kallsyms,
> +					unsigned long addr,
> +					unsigned long *bestval,
> +					unsigned long *nextval)
> +
> +{
> +	unsigned int mid, low = 1, high = kallsyms->num_func_syms + 1;
> +	unsigned int best = 0;
> +	unsigned long thisval;
> +
> +	while (low < high) {
> +		mid = low + (high - low) / 2;
> +		thisval = kallsyms_symbol_value(&kallsyms->symtab[mid]);
> +
> +		if (thisval <= addr) {
> +			*bestval = thisval;
> +			best = mid;
> +			low = mid + 1;

If thisval == addr, the search moves to the right and finds the last
symbol with the same address. I believe it should do the opposite and
return the first symbol to match the behavior of
search_kallsyms_symbol().

> +		} else {
> +			*nextval = thisval;
> +			high = mid;
> +		}
> +	}
> +
> +	return best;
> +}
> +
> +static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms,
> +					unsigned int symnum)
> +{
> +	return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> +}
> +
> +static unsigned int search_kallsyms_symbol(struct mod_kallsyms *kallsyms,
> +					   unsigned long addr,
> +					   unsigned long *bestval,
> +					   unsigned long *nextval)
> +{
> +	unsigned int i, best = 0;
> +
> +	/*
> +	 * Scan for closest preceding symbol and next symbol. (ELF starts
> +	 * real symbols at 1). Skip the initial function symbols range
> +	 * if num_func_syms is non-zero, those are handled separately for
> +	 * the core TEXT segment lookup.
> +	 */
> +	for (i = 1 + kallsyms->num_func_syms; i < kallsyms->num_symtab; i++) {
> +		const Elf_Sym *sym = &kallsyms->symtab[i];
> +		unsigned long thisval = kallsyms_symbol_value(sym);
> +
> +		if (sym->st_shndx == SHN_UNDEF)
> +			continue;
> +
> +		/*
> +		 * We ignore unnamed symbols: they're uninformative
> +		 * and inserted at a whim.
> +		 */
> +		if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
> +		    is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
> +			continue;
> +
> +		if (thisval <= addr && thisval > *bestval) {
> +			best = i;
> +			*bestval = thisval;
> +		}
> +		if (thisval > addr && thisval < *nextval)
> +			*nextval = thisval;
> +	}
> +
> +	return best;
> +}
> +
> +static int elf_sym_cmp(const void *a, const void *b)
> +{
> +	unsigned long val_a = kallsyms_symbol_value((const Elf_Sym *)a);
> +	unsigned long val_b = kallsyms_symbol_value((const Elf_Sym *)b);
> +
> +	if (val_a < val_b)
> +		return -1;
> +
> +	return val_a > val_b;

Does this comparison function and the sort() call result in stable
sorting? If val_a and val_b are the same, the sorting should preserve
the original order.

> +}
> +
>  /*
>   * We only allocate and copy the strings needed by the parts of symtab
>   * we keep.  This is simple, but has the effect of making multiple
> @@ -115,9 +205,10 @@ void layout_symtab(struct module *mod, struct load_info *info)
>  	Elf_Shdr *symsect = info->sechdrs + info->index.sym;
>  	Elf_Shdr *strsect = info->sechdrs + info->index.str;
>  	const Elf_Sym *src;
> -	unsigned int i, nsrc, ndst, strtab_size = 0;
> +	unsigned int i, nsrc, ndst, nfunc, strtab_size = 0;
>  	struct module_memory *mod_mem_data = &mod->mem[MOD_DATA];
>  	struct module_memory *mod_mem_init_data = &mod->mem[MOD_INIT_DATA];
> +	bool is_lp_mod = is_livepatch_module(mod);
>  
>  	/* Put symbol section at end of init part of module. */
>  	symsect->sh_flags |= SHF_ALLOC;
> @@ -129,12 +220,14 @@ void layout_symtab(struct module *mod, struct load_info *info)
>  	nsrc = symsect->sh_size / sizeof(*src);
>  
>  	/* Compute total space required for the core symbols' strtab. */
> -	for (ndst = i = 0; i < nsrc; i++) {
> -		if (i == 0 || is_livepatch_module(mod) ||
> +	for (ndst = nfunc = i = 0; i < nsrc; i++) {
> +		if (i == 0 || is_lp_mod ||
>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>  				   info->index.pcpu)) {
>  			strtab_size += strlen(&info->strtab[src[i].st_name]) + 1;
>  			ndst++;
> +			if (!is_lp_mod && is_func_symbol(src + i))
> +				nfunc++;
>  		}
>  	}
>  
> @@ -156,6 +249,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
>  	mod_mem_init_data->size = ALIGN(mod_mem_init_data->size,
>  					__alignof__(struct mod_kallsyms));
>  	info->mod_kallsyms_init_off = mod_mem_init_data->size;
> +	info->num_func_syms = nfunc;
>  
>  	mod_mem_init_data->size += sizeof(struct mod_kallsyms);
>  	info->init_typeoffs = mod_mem_init_data->size;
> @@ -169,7 +263,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
>   */
>  void add_kallsyms(struct module *mod, const struct load_info *info)
>  {
> -	unsigned int i, ndst;
> +	unsigned int i, di, nfunc, ndst;
>  	const Elf_Sym *src;
>  	Elf_Sym *dst;
>  	char *s;
> @@ -178,6 +272,7 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  	void *data_base = mod->mem[MOD_DATA].base;
>  	void *init_data_base = mod->mem[MOD_INIT_DATA].base;
>  	struct mod_kallsyms *kallsyms;
> +	bool is_lp_mod = is_livepatch_module(mod);
>  
>  	kallsyms = init_data_base + info->mod_kallsyms_init_off;

This code is followed by the initialization of kallsyms:

	kallsyms->symtab = (void *)symsec->sh_addr;
	kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym);
	/* Make sure we get permanent strtab: don't use info->strtab. */
	kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
	kallsyms->typetab = init_data_base + info->init_typeoffs;

I suggest adding 'kallsyms->num_func_syms = 0;' after the initialization
of kallsyms->num_symtab.

>  
> @@ -194,19 +289,28 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  	mod->core_kallsyms.symtab = dst = data_base + info->symoffs;
>  	mod->core_kallsyms.strtab = s = data_base + info->stroffs;
>  	mod->core_kallsyms.typetab = data_base + info->core_typeoffs;
> +
>  	strtab_size = info->core_typeoffs - info->stroffs;
>  	src = kallsyms->symtab;
> -	for (ndst = i = 0; i < kallsyms->num_symtab; i++) {
> +	ndst = info->num_func_syms + 1;
> +
> +	for (nfunc = i = 0; i < kallsyms->num_symtab; i++) {
>  		kallsyms->typetab[i] = elf_type(src + i, info);
> -		if (i == 0 || is_livepatch_module(mod) ||
> +		if (i == 0 || is_lp_mod ||
>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>  				   info->index.pcpu)) {
>  			ssize_t ret;
>  
> -			mod->core_kallsyms.typetab[ndst] =
> -				kallsyms->typetab[i];
> -			dst[ndst] = src[i];
> -			dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
> +			if (i == 0)
> +				di = 0;
> +			else if (!is_lp_mod && is_func_symbol(src + i))
> +				di = 1 + nfunc++;
> +			else
> +				di = ndst++;
> +
> +			mod->core_kallsyms.typetab[di] = kallsyms->typetab[i];
> +			dst[di] = src[i];
> +			dst[di].st_name = s - mod->core_kallsyms.strtab;
>  			ret = strscpy(s, &kallsyms->strtab[src[i].st_name],
>  				      strtab_size);
>  			if (ret < 0)
> @@ -216,9 +320,13 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  		}
>  	}
>  
> +	WARN_ON_ONCE(nfunc != info->num_func_syms);
> +	sort(dst + 1, nfunc, sizeof(Elf_Sym), elf_sym_cmp, NULL);
> +

The code sorts mod->core_kallsyms.symtab but mod->core_kallsyms.typetab
is not reordered accordingly.

>  	/* Set up to point into init section. */
>  	rcu_assign_pointer(mod->kallsyms, kallsyms);
>  	mod->core_kallsyms.num_symtab = ndst;
> +	mod->core_kallsyms.num_func_syms = nfunc;
>  }
>  
>  #if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
> @@ -241,11 +349,6 @@ void init_build_id(struct module *mod, const struct load_info *info)
>  }
>  #endif
>  
> -static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned int symnum)
> -{
> -	return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> -}
> -
>  /*
>   * Given a module and address, find the corresponding symbol and return its name
>   * while providing its size and offset if needed.
> @@ -255,7 +358,10 @@ static const char *find_kallsyms_symbol(struct module *mod,
>  					unsigned long *size,
>  					unsigned long *offset)
>  {
> -	unsigned int i, best = 0;
> +	unsigned int (*search)(struct mod_kallsyms *kallsyms,
> +			       unsigned long addr, unsigned long *bestval,
> +			       unsigned long *nextval);
> +	unsigned int best;
>  	unsigned long nextval, bestval;
>  	struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms);
>  	struct module_memory *mod_mem = NULL;
> @@ -266,6 +372,11 @@ static const char *find_kallsyms_symbol(struct module *mod,
>  			continue;
>  #endif
>  		if (within_module_mem_type(addr, mod, type)) {
> +			if (type == MOD_TEXT && kallsyms->num_func_syms > 0)
> +				search = bsearch_func_symbol;

I'm not sure if it is ok to limit the search only to function symbols
when the address lies in MOD_TEXT. The text can theoretically contain
non-function symbols. Could this optimization be adjusted to sort all
MOD_TEXT symbols (excluding anonymous and mapping symbols) and move them
to the front of the symbol table?

> +			else
> +				search = search_kallsyms_symbol;
> +
>  			mod_mem = &mod->mem[type];
>  			break;
>  		}
> @@ -278,33 +389,7 @@ static const char *find_kallsyms_symbol(struct module *mod,
>  	nextval = (unsigned long)mod_mem->base + mod_mem->size;
>  	bestval = (unsigned long)mod_mem->base - 1;
>  
> -	/*
> -	 * Scan for closest preceding symbol, and next symbol. (ELF
> -	 * starts real symbols at 1).
> -	 */
> -	for (i = 1; i < kallsyms->num_symtab; i++) {
> -		const Elf_Sym *sym = &kallsyms->symtab[i];
> -		unsigned long thisval = kallsyms_symbol_value(sym);
> -
> -		if (sym->st_shndx == SHN_UNDEF)
> -			continue;
> -
> -		/*
> -		 * We ignore unnamed symbols: they're uninformative
> -		 * and inserted at a whim.
> -		 */
> -		if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
> -		    is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
> -			continue;
> -
> -		if (thisval <= addr && thisval > bestval) {
> -			best = i;
> -			bestval = thisval;
> -		}
> -		if (thisval > addr && thisval < nextval)
> -			nextval = thisval;
> -	}
> -
> +	best = search(kallsyms, addr, &bestval, &nextval);
>  	if (!best)
>  		return NULL;
>  

-- 
Thanks,
Petr

  reply	other threads:[~2026-04-23 14:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 11:00 [PATCH v2 1/2] module/kallsyms: fix nextval for data symbol lookup Stanislaw Gruszka
2026-03-27 11:00 ` [PATCH v2 2/2] module/kallsyms: sort function symbols and use binary search Stanislaw Gruszka
2026-04-23 14:00   ` Petr Pavlu [this message]
2026-04-24  9:13     ` Stanislaw Gruszka
2026-04-27 13:51       ` Petr Pavlu
2026-04-28  8:23         ` Stanislaw Gruszka
2026-05-05 12:24   ` Petr Mladek
2026-05-05 14:37     ` Petr Pavlu
2026-04-08 15:24 ` [PATCH v2 1/2] module/kallsyms: fix nextval for data symbol lookup Petr Pavlu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11c8e139-f9f3-4b22-863a-4e021a3947e7@suse.com \
    --to=petr.pavlu@suse.com \
    --cc=atomlin@atomlin.com \
    --cc=da.gomez@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@jordanrome.com \
    --cc=live-patching@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=stf_xl@wp.pl \
    --cc=vmalik@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox