public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Pavlu <petr.pavlu@suse.com>
To: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: linux-modules@vger.kernel.org,
	Sami Tolvanen <samitolvanen@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	live-patching@vger.kernel.org, Daniel Gomez <da.gomez@kernel.org>,
	Aaron Tomlin <atomlin@atomlin.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Jordan Rome <linux@jordanrome.com>,
	Viktor Malik <vmalik@redhat.com>
Subject: Re: [PATCH v2 2/2] module/kallsyms: sort function symbols and use binary search
Date: Mon, 27 Apr 2026 15:51:31 +0200	[thread overview]
Message-ID: <88ae41dc-e5e0-442a-9b95-5125adf31e75@suse.com> (raw)
In-Reply-To: <20260424091330.GA31168@wp.pl>

On 4/24/26 11:13 AM, Stanislaw Gruszka wrote:
> On Thu, Apr 23, 2026 at 04:00:04PM +0200, Petr Pavlu wrote:
>> On 3/27/26 12:00 PM, Stanislaw Gruszka wrote:
[...]
>>> diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
>>> index f23126d804b2..d69e99e67707 100644
>>> --- a/kernel/module/kallsyms.c
>>> +++ b/kernel/module/kallsyms.c
>>> @@ -10,6 +10,7 @@
>>>  #include <linux/kallsyms.h>
>>>  #include <linux/buildid.h>
>>>  #include <linux/bsearch.h>
>>> +#include <linux/sort.h>
>>>  #include "internal.h"
>>>  
>>>  /* Lookup exported symbol in given range of kernel_symbols */
>>> @@ -103,6 +104,95 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
>>>  	return true;
>>>  }
>>>  
>>> +static inline bool is_func_symbol(const Elf_Sym *sym)
>>> +{
>>> +	return sym->st_shndx != SHN_UNDEF && sym->st_size != 0 &&
>>> +	       ELF_ST_TYPE(sym->st_info) == STT_FUNC;
>>> +}
>>> +
>>> +static unsigned int bsearch_func_symbol(struct mod_kallsyms *kallsyms,
>>> +					unsigned long addr,
>>> +					unsigned long *bestval,
>>> +					unsigned long *nextval)
>>> +
>>> +{
>>> +	unsigned int mid, low = 1, high = kallsyms->num_func_syms + 1;
>>> +	unsigned int best = 0;
>>> +	unsigned long thisval;
>>> +
>>> +	while (low < high) {
>>> +		mid = low + (high - low) / 2;
>>> +		thisval = kallsyms_symbol_value(&kallsyms->symtab[mid]);
>>> +
>>> +		if (thisval <= addr) {
>>> +			*bestval = thisval;
>>> +			best = mid;
>>> +			low = mid + 1;
>>
>> If thisval == addr, the search moves to the right and finds the last
>> symbol with the same address. I believe it should do the opposite and
>> return the first symbol to match the behavior of
>> search_kallsyms_symbol().
> 
> In the case of multiple symbols sharing the same address, we have
> to pick one and ignore the others. I don’t think it matters much which
> one is chosen in practice. Also, I expect function symbol addresses
> to be unique, so this shouldn’t be a real issue.

I think that the code should consistently pick the same answer. If
someone uses aliases for their functions, the logic shouldn't
arbitrarily return one of them, but preferably the first one, which
should normally be the actual implementation.

> 
>>> +		} else {
>>> +			*nextval = thisval;
>>> +			high = mid;
>>> +		}
>>> +	}
>>> +
>>> +	return best;
>>> +}
>>> +
>>> +static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms,
>>> +					unsigned int symnum)
>>> +{
>>> +	return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
>>> +}
>>> +
>>> +static unsigned int search_kallsyms_symbol(struct mod_kallsyms *kallsyms,
>>> +					   unsigned long addr,
>>> +					   unsigned long *bestval,
>>> +					   unsigned long *nextval)
>>> +{
>>> +	unsigned int i, best = 0;
>>> +
>>> +	/*
>>> +	 * Scan for closest preceding symbol and next symbol. (ELF starts
>>> +	 * real symbols at 1). Skip the initial function symbols range
>>> +	 * if num_func_syms is non-zero, those are handled separately for
>>> +	 * the core TEXT segment lookup.
>>> +	 */
>>> +	for (i = 1 + kallsyms->num_func_syms; i < kallsyms->num_symtab; i++) {
>>> +		const Elf_Sym *sym = &kallsyms->symtab[i];
>>> +		unsigned long thisval = kallsyms_symbol_value(sym);
>>> +
>>> +		if (sym->st_shndx == SHN_UNDEF)
>>> +			continue;
>>> +
>>> +		/*
>>> +		 * We ignore unnamed symbols: they're uninformative
>>> +		 * and inserted at a whim.
>>> +		 */
>>> +		if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
>>> +		    is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
>>> +			continue;
>>> +
>>> +		if (thisval <= addr && thisval > *bestval) {
>>> +			best = i;
>>> +			*bestval = thisval;
>>> +		}
>>> +		if (thisval > addr && thisval < *nextval)
>>> +			*nextval = thisval;
>>> +	}
>>> +
>>> +	return best;
>>> +}
>>> +
>>> +static int elf_sym_cmp(const void *a, const void *b)
>>> +{
>>> +	unsigned long val_a = kallsyms_symbol_value((const Elf_Sym *)a);
>>> +	unsigned long val_b = kallsyms_symbol_value((const Elf_Sym *)b);
>>> +
>>> +	if (val_a < val_b)
>>> +		return -1;
>>> +
>>> +	return val_a > val_b;
>>
>> Does this comparison function and the sort() call result in stable
>> sorting? If val_a and val_b are the same, the sorting should preserve
>> the original order.
> 
> The kernel’s sort() implementation is not stable.

Ok, I see it is a heapsort. It would require additional data to keep
information about the original indexes for elf_sym_cmp() to use as
a tiebreaker.

> 
>>> +}
>>> +
>>>  /*
>>>   * We only allocate and copy the strings needed by the parts of symtab
>>>   * we keep.  This is simple, but has the effect of making multiple
>>> @@ -115,9 +205,10 @@ void layout_symtab(struct module *mod, struct load_info *info)
>>>  	Elf_Shdr *symsect = info->sechdrs + info->index.sym;
>>>  	Elf_Shdr *strsect = info->sechdrs + info->index.str;
>>>  	const Elf_Sym *src;
>>> -	unsigned int i, nsrc, ndst, strtab_size = 0;
>>> +	unsigned int i, nsrc, ndst, nfunc, strtab_size = 0;
>>>  	struct module_memory *mod_mem_data = &mod->mem[MOD_DATA];
>>>  	struct module_memory *mod_mem_init_data = &mod->mem[MOD_INIT_DATA];
>>> +	bool is_lp_mod = is_livepatch_module(mod);
>>>  
>>>  	/* Put symbol section at end of init part of module. */
>>>  	symsect->sh_flags |= SHF_ALLOC;
>>> @@ -129,12 +220,14 @@ void layout_symtab(struct module *mod, struct load_info *info)
>>>  	nsrc = symsect->sh_size / sizeof(*src);
>>>  
>>>  	/* Compute total space required for the core symbols' strtab. */
>>> -	for (ndst = i = 0; i < nsrc; i++) {
>>> -		if (i == 0 || is_livepatch_module(mod) ||
>>> +	for (ndst = nfunc = i = 0; i < nsrc; i++) {
>>> +		if (i == 0 || is_lp_mod ||
>>>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>>>  				   info->index.pcpu)) {
>>>  			strtab_size += strlen(&info->strtab[src[i].st_name]) + 1;
>>>  			ndst++;
>>> +			if (!is_lp_mod && is_func_symbol(src + i))
>>> +				nfunc++;
>>>  		}
>>>  	}
>>>  
>>> @@ -156,6 +249,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
>>>  	mod_mem_init_data->size = ALIGN(mod_mem_init_data->size,
>>>  					__alignof__(struct mod_kallsyms));
>>>  	info->mod_kallsyms_init_off = mod_mem_init_data->size;
>>> +	info->num_func_syms = nfunc;
>>>  
>>>  	mod_mem_init_data->size += sizeof(struct mod_kallsyms);
>>>  	info->init_typeoffs = mod_mem_init_data->size;
>>> @@ -169,7 +263,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
>>>   */
>>>  void add_kallsyms(struct module *mod, const struct load_info *info)
>>>  {
>>> -	unsigned int i, ndst;
>>> +	unsigned int i, di, nfunc, ndst;
>>>  	const Elf_Sym *src;
>>>  	Elf_Sym *dst;
>>>  	char *s;
>>> @@ -178,6 +272,7 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>>>  	void *data_base = mod->mem[MOD_DATA].base;
>>>  	void *init_data_base = mod->mem[MOD_INIT_DATA].base;
>>>  	struct mod_kallsyms *kallsyms;
>>> +	bool is_lp_mod = is_livepatch_module(mod);
>>>  
>>>  	kallsyms = init_data_base + info->mod_kallsyms_init_off;
>>
>> This code is followed by the initialization of kallsyms:
>>
>> 	kallsyms->symtab = (void *)symsec->sh_addr;
>> 	kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym);
>> 	/* Make sure we get permanent strtab: don't use info->strtab. */
>> 	kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
>> 	kallsyms->typetab = init_data_base + info->init_typeoffs;
>>
>> I suggest adding 'kallsyms->num_func_syms = 0;' after the initialization
>> of kallsyms->num_symtab.
> 
> I relied on zeroed memory initialization, but I can add this explicitly
> for clarity.
> 
>>> @@ -194,19 +289,28 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>>>  	mod->core_kallsyms.symtab = dst = data_base + info->symoffs;
>>>  	mod->core_kallsyms.strtab = s = data_base + info->stroffs;
>>>  	mod->core_kallsyms.typetab = data_base + info->core_typeoffs;
>>> +
>>>  	strtab_size = info->core_typeoffs - info->stroffs;
>>>  	src = kallsyms->symtab;
>>> -	for (ndst = i = 0; i < kallsyms->num_symtab; i++) {
>>> +	ndst = info->num_func_syms + 1;
>>> +
>>> +	for (nfunc = i = 0; i < kallsyms->num_symtab; i++) {
>>>  		kallsyms->typetab[i] = elf_type(src + i, info);
>>> -		if (i == 0 || is_livepatch_module(mod) ||
>>> +		if (i == 0 || is_lp_mod ||
>>>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>>>  				   info->index.pcpu)) {
>>>  			ssize_t ret;
>>>  
>>> -			mod->core_kallsyms.typetab[ndst] =
>>> -				kallsyms->typetab[i];
>>> -			dst[ndst] = src[i];
>>> -			dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
>>> +			if (i == 0)
>>> +				di = 0;
>>> +			else if (!is_lp_mod && is_func_symbol(src + i))
>>> +				di = 1 + nfunc++;
>>> +			else
>>> +				di = ndst++;
>>> +
>>> +			mod->core_kallsyms.typetab[di] = kallsyms->typetab[i];
>>> +			dst[di] = src[i];
>>> +			dst[di].st_name = s - mod->core_kallsyms.strtab;
>>>  			ret = strscpy(s, &kallsyms->strtab[src[i].st_name],
>>>  				      strtab_size);
>>>  			if (ret < 0)
>>> @@ -216,9 +320,13 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>>>  		}
>>>  	}
>>>  
>>> +	WARN_ON_ONCE(nfunc != info->num_func_syms);
>>> +	sort(dst + 1, nfunc, sizeof(Elf_Sym), elf_sym_cmp, NULL);
>>> +
>>
>> The code sorts mod->core_kallsyms.symtab but mod->core_kallsyms.typetab
>> is not reordered accordingly.
> 
> Right, but for function symbols the typetab entries are all 't',
> so swapping them does not change the type value. The 'T' vs 't'
> distinction is handled later when printing (based on export status).
> But the comment explaining skiping adjusting of
> mod->core_kallsyms.typetab is needed.

Modules can also contain weak functions with elf_type() = 'w'.

> 
>>>  	/* Set up to point into init section. */
>>>  	rcu_assign_pointer(mod->kallsyms, kallsyms);
>>>  	mod->core_kallsyms.num_symtab = ndst;
>>> +	mod->core_kallsyms.num_func_syms = nfunc;
>>>  }
>>>  
>>>  #if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
>>> @@ -241,11 +349,6 @@ void init_build_id(struct module *mod, const struct load_info *info)
>>>  }
>>>  #endif
>>>  
>>> -static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned int symnum)
>>> -{
>>> -	return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
>>> -}
>>> -
>>>  /*
>>>   * Given a module and address, find the corresponding symbol and return its name
>>>   * while providing its size and offset if needed.
>>> @@ -255,7 +358,10 @@ static const char *find_kallsyms_symbol(struct module *mod,
>>>  					unsigned long *size,
>>>  					unsigned long *offset)
>>>  {
>>> -	unsigned int i, best = 0;
>>> +	unsigned int (*search)(struct mod_kallsyms *kallsyms,
>>> +			       unsigned long addr, unsigned long *bestval,
>>> +			       unsigned long *nextval);
>>> +	unsigned int best;
>>>  	unsigned long nextval, bestval;
>>>  	struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms);
>>>  	struct module_memory *mod_mem = NULL;
>>> @@ -266,6 +372,11 @@ static const char *find_kallsyms_symbol(struct module *mod,
>>>  			continue;
>>>  #endif
>>>  		if (within_module_mem_type(addr, mod, type)) {
>>> +			if (type == MOD_TEXT && kallsyms->num_func_syms > 0)
>>> +				search = bsearch_func_symbol;
>>
>> I'm not sure if it is ok to limit the search only to function symbols
>> when the address lies in MOD_TEXT. The text can theoretically contain
>> non-function symbols.
> 
> Yes, the patch assumes that the only valid symbols in the MOD_TEXT
> are functions. If there are defined OBJECT symbols in .text, the patch
> would break lookup for those.
> 
> While it’s theoretically possible (e.g. hand-written assembly placing
> data in .text ?), I’m not sure this is a practical concern. In general,
> having data in executable segments is discouraged for security reasons. 
> 
>> Could this optimization be adjusted to sort all
>> MOD_TEXT symbols (excluding anonymous and mapping symbols) and move them
>> to the front of the symbol table?
> 
> That’s possible. We could track .text sections indices in
> __layout_sections() and include all valid symbols from those sections,
> and also reorder typetab accordingly.
> 
> However, this adds complexity. I would prefer to first confirm whether
> OBJECT symbols in MOD_TEXT is a real issue before going in that direction.

I'm not aware of specific OBJECT symbols that end up in MOD_TEXT.
Nonetheless, it is a valid case and it is preferable that an
optimization doesn't break their lookup by address.

In general, I'm worried about the several edge cases and inconsistencies
that this optimization introduces. This also includes the fact that it
doesn't work for livepatch modules.

An alternative could be to keep the symbol table untouched and have
a separate array with symbol indexes that is sorted by their addresses,
but it requires evaluation if the additional memory usage is worth it.

-- 
Thanks,
Petr

  reply	other threads:[~2026-04-27 13:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 11:00 [PATCH v2 1/2] module/kallsyms: fix nextval for data symbol lookup Stanislaw Gruszka
2026-03-27 11:00 ` [PATCH v2 2/2] module/kallsyms: sort function symbols and use binary search Stanislaw Gruszka
2026-04-23 14:00   ` Petr Pavlu
2026-04-24  9:13     ` Stanislaw Gruszka
2026-04-27 13:51       ` Petr Pavlu [this message]
2026-04-08 15:24 ` [PATCH v2 1/2] module/kallsyms: fix nextval for data symbol lookup Petr Pavlu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88ae41dc-e5e0-442a-9b95-5125adf31e75@suse.com \
    --to=petr.pavlu@suse.com \
    --cc=atomlin@atomlin.com \
    --cc=da.gomez@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@jordanrome.com \
    --cc=live-patching@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=stf_xl@wp.pl \
    --cc=vmalik@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox