From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>, qemu-devel@nongnu.org
Cc: "Manos Pitsidianakis" <manos.pitsidianakis@linaro.org>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Alexandre Iooss" <erdnaxe@crans.org>,
"Mahmoud Mandour" <ma.mandourr@gmail.com>
Subject: Re: [PATCH] contrib/plugins/uftrace_symbols.py: generate debug files to map symbols to source
Date: Wed, 15 Oct 2025 14:02:53 -0700 [thread overview]
Message-ID: <b4a0756b-3781-433a-aa4c-338d9348da5d@linaro.org> (raw)
In-Reply-To: <b7256e10-d5b1-4978-90e1-99d9e76318df@linaro.org>
On 10/15/25 1:40 PM, Philippe Mathieu-Daudé wrote:
> On 13/10/25 23:39, Pierrick Bouvier wrote:
>> Enhance uftrace_symbols.py to generate .dbg files, containing
>> source location for every symbol present in .sym file.
>> It allows to use uftrace {replay,dump} --srcline and show origin of
>
> `uftrace {replay,dump} --srcline`
>
>> functions, connecting trace to original source code.
>>
>> It was first implemented with pyelftools DWARF parser, which was way
>> to slow (~minutes) to get locations for every symbol in the linux
>
> s/to/too/
>
>> kernel. Thus, we use addr2line instead, which runs in seconds.
>>
>> As well, there were some bugs with latest pyelftools release,
>> requiring to run master version, which is not installable with pip.
>> Thus, since we now require binutils (addr2line), we can ditch pyelftools
>> based implementation and simply rely on nm to get symbols information,
>
> `nm`
>
>> which is faster and better.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace_symbols.py | 108 +++++++++++++++++++----------
>> 1 file changed, 72 insertions(+), 36 deletions(-)
>>
>> diff --git a/contrib/plugins/uftrace_symbols.py b/contrib/plugins/uftrace_symbols.py
>> index b49e03203c8..728bf04ce54 100755
>> --- a/contrib/plugins/uftrace_symbols.py
>> +++ b/contrib/plugins/uftrace_symbols.py
>> @@ -1,7 +1,7 @@
>> #!/usr/bin/env python3
>> # -*- coding: utf-8 -*-
>> #
>> -# Create symbols and mapping files for uftrace.
>> +# Create symbols, debug and mapping files for uftrace.
>> #
>> # Copyright 2025 Linaro Ltd
>> # Author: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> @@ -9,44 +9,71 @@
>> # SPDX-License-Identifier: GPL-2.0-or-later
>>
>> import argparse
>> -import elftools # pip install pyelftools
>> import os
>> +import subprocess
>>
>> -from elftools.elf.elffile import ELFFile
>> -from elftools.elf.sections import SymbolTableSection
>> +class Symbol:
>> + def __init__(self, name, addr, size):
>> + self.name = name
>> + # clamp addr to 48 bits, like uftrace entries
>> + self.addr = addr & 0xffffffffffff
>> + self.full_addr = addr
>> + self.size = size
>>
>> -def elf_func_symbols(elf):
>> - symbol_tables = [(idx, s) for idx, s in enumerate(elf.iter_sections())
>> - if isinstance(s, SymbolTableSection)]
>> - symbols = []
>> - for _, section in symbol_tables:
>> - for _, symbol in enumerate(section.iter_symbols()):
>> - if symbol_size(symbol) == 0:
>> - continue
>> - type = symbol['st_info']['type']
>> - if type == 'STT_FUNC' or type == 'STT_NOTYPE':
>> - symbols.append(symbol)
>> - symbols.sort(key = lambda x: symbol_addr(x))
>> + def set_loc(self, file, line):
>> + self.file = file
>> + self.line = line
>> +
>> +def get_symbols(elf_file):
>> + symbols=[]
>> + try:
>> + out = subprocess.check_output(['nm', '--print-size', elf_file],
>> + stderr=subprocess.STDOUT,
>> + text=True)
>
> Nitpicking, we might be using cross-compiled `nm`, so maybe not hardcode
> the binary name.
>
>> + except subprocess.CalledProcessError as e:
>> + print(e.output)
>> + raise
>> + out = out.strip().split('\n')
>> + for line in out:
>> + info = line.split(' ')
>> + if len(info) == 3:
>> + # missing size information
>> + continue
>> + addr, size, type, name = info
>> + # add only symbols from .text section
>> + if type.lower() != 't':
>> + continue
>> + addr = int(addr, 16)
>> + size = int(size, 16)
>> + symbols.append(Symbol(name, addr, size))
>> + symbols.sort(key = lambda x: x.addr)
>> return symbols
>>
>> -def symbol_size(symbol):
>> - return symbol['st_size']
>> -
>> -def symbol_addr(symbol):
>> - addr = symbol['st_value']
>> - # clamp addr to 48 bits, like uftrace entries
>> - return addr & 0xffffffffffff
>> -
>> -def symbol_name(symbol):
>> - return symbol.name
>> +def find_symbols_locations(elf_file, symbols):
>> + addresses = '\n'.join([hex(x.full_addr) for x in symbols])
>> + try:
>> + out = subprocess.check_output(['addr2line', '--exe', elf_file],
>
> Ditto (cross compiled)?
>
>> + stderr=subprocess.STDOUT,
>> + input=addresses, text=True)
>> + except subprocess.CalledProcessError as e:
>> + print(e.output)
>> + raise
>> + out = out.strip().split('\n')
>> + assert len(out) == len(symbols)
>> + for i in range(len(symbols)):
>> + s = symbols[i]
>> + file, line = out[i].split(':')
>> + # addr2line may return 'line (discriminator [0-9]+)' sometimes,
>> + # remove this to keep only line number.
>> + line = line.split(' ')[0]
>> + s.set_loc(file, line)
>>
>> class BinaryFile:
>> def __init__(self, path, map_offset):
>> self.fullpath = os.path.realpath(path)
>> self.map_offset = map_offset
>> - with open(path, 'rb') as f:
>> - self.elf = ELFFile(f)
>> - self.symbols = elf_func_symbols(self.elf)
>> + self.symbols = get_symbols(self.fullpath)
>> + find_symbols_locations(self.fullpath, self.symbols)
>>
>> def path(self):
>> return self.fullpath
>> @@ -56,7 +83,7 @@ def addr_start(self):
>>
>> def addr_end(self):
>> last_sym = self.symbols[-1]
>> - return symbol_addr(last_sym) + symbol_size(last_sym) + self.map_offset
>> + return last_sym.addr + last_sym.size + self.map_offset
>>
>> def generate_symbol_file(self, prefix_symbols):
>> binary_name = os.path.basename(self.fullpath)
>> @@ -66,14 +93,21 @@ def generate_symbol_file(self, prefix_symbols):
>> # print hexadecimal addresses on 48 bits
>> addrx = "0>12x"
>> for s in self.symbols:
>> - addr = symbol_addr(s)
>> + addr = s.addr
>> addr = f'{addr:{addrx}}'
>> - size = f'{symbol_size(s):{addrx}}'
>> - name = symbol_name(s)
>> + size = f'{s.size:{addrx}}'
>> if prefix_symbols:
>> - name = f'{binary_name}:{name}'
>> + name = f'{binary_name}:{s.name}'
>> print(addr, size, 'T', name, file=sym_file)
>>
>> + def generate_debug_file(self):
>> + binary_name = os.path.basename(self.fullpath)
>> + dbg_file_path = f'./uftrace.data/{binary_name}.dbg'
>
> Prefer os.path.join().
>
>> + with open(dbg_file_path, 'w') as dbg_file:
>> + for s in self.symbols:
>> + print(f'F: {hex(s.addr)} {s.name}', file=dbg_file)
>> + print(f'L: {s.line} {s.file}', file=dbg_file)
>> +
>> def parse_parameter(p):
>> s = p.split(":")
>> path = s[0]
>> @@ -84,7 +118,7 @@ def parse_parameter(p):
>> offset = s[1]
>> if not offset.startswith('0x'):
>> err = f'offset "{offset}" is not an hexadecimal constant. '
>> - err += 'It should starts with "0x".'
>> + err += 'It should start with "0x".'
>> raise ValueError(err)
>> offset = int(offset, 16)
>> return path, offset
>> @@ -124,7 +158,8 @@ def generate_map(binaries):
>>
>> def main():
>> parser = argparse.ArgumentParser(description=
>> - 'generate symbol files for uftrace')
>> + 'generate symbol files for uftrace. '
>> + 'Require binutils (nm and addr2line).')
>> parser.add_argument('elf_file', nargs='+',
>> help='path to an ELF file. '
>> 'Use /path/to/file:0xdeadbeef to add a mapping offset.')
>> @@ -145,6 +180,7 @@ def main():
>>
>> for b in binaries:
>> b.generate_symbol_file(args.prefix_symbols)
>> + b.generate_debug_file()
>>
>> generate_map(binaries)
>>
> No blocking comments:
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>
Thanks Philippe.
For the cross compiled tools, I'm not really sure it's worth making this
more complex. Having tooling for a cross architecture is an advanced
setup, and I think it's fair to expect someone to have binutils
installed if they have any cross compiler and cross binutils.
Plus, they can always create a symlink if needed.
Alex, can you eventually integrate the (other) cosmetic changes?
If not, I can send a new patch if you prefer.
Thanks,
Pierrick
next prev parent reply other threads:[~2025-10-15 21:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 21:39 [PATCH] contrib/plugins/uftrace_symbols.py: generate debug files to map symbols to source Pierrick Bouvier
2025-10-15 11:11 ` Alex Bennée
2025-10-15 17:41 ` Pierrick Bouvier
2025-10-15 20:40 ` Philippe Mathieu-Daudé
2025-10-15 21:02 ` Pierrick Bouvier [this message]
2025-10-15 21:12 ` Pierrick Bouvier
2025-10-16 5:45 ` Philippe Mathieu-Daudé
2025-10-15 21:23 ` Alex Bennée
2025-10-15 23:31 ` Pierrick Bouvier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b4a0756b-3781-433a-aa4c-338d9348da5d@linaro.org \
--to=pierrick.bouvier@linaro.org \
--cc=alex.bennee@linaro.org \
--cc=erdnaxe@crans.org \
--cc=ma.mandourr@gmail.com \
--cc=manos.pitsidianakis@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).