From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>,
qemu-devel@nongnu.org,
"Manos Pitsidianakis" <manos.pitsidianakis@linaro.org>,
"Alexandre Iooss" <erdnaxe@crans.org>,
"Mahmoud Mandour" <ma.mandourr@gmail.com>
Subject: Re: [PATCH] contrib/plugins/uftrace_symbols.py: generate debug files to map symbols to source
Date: Wed, 15 Oct 2025 16:31:11 -0700 [thread overview]
Message-ID: <0167e505-e769-4305-87b4-f281219a3465@linaro.org> (raw)
In-Reply-To: <87347jkgfw.fsf@draig.linaro.org>
On 10/15/25 2:23 PM, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> On 10/15/25 1:40 PM, Philippe Mathieu-Daudé wrote:
>>> On 13/10/25 23:39, Pierrick Bouvier wrote:
>>>> Enhance uftrace_symbols.py to generate .dbg files, containing
>>>> source location for every symbol present in .sym file.
>>>> It allows to use uftrace {replay,dump} --srcline and show origin of
>>> `uftrace {replay,dump} --srcline`
>>>
>>>> functions, connecting trace to original source code.
>>>>
>>>> It was first implemented with pyelftools DWARF parser, which was way
>>>> to slow (~minutes) to get locations for every symbol in the linux
>>> s/to/too/
>>>
>>>> kernel. Thus, we use addr2line instead, which runs in seconds.
>>>>
>>>> As well, there were some bugs with latest pyelftools release,
>>>> requiring to run master version, which is not installable with pip.
>>>> Thus, since we now require binutils (addr2line), we can ditch pyelftools
>>>> based implementation and simply rely on nm to get symbols information,
>>> `nm`
>>>
>>>> which is faster and better.
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>> contrib/plugins/uftrace_symbols.py | 108 +++++++++++++++++++----------
>>>> 1 file changed, 72 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/contrib/plugins/uftrace_symbols.py b/contrib/plugins/uftrace_symbols.py
>>>> index b49e03203c8..728bf04ce54 100755
>>>> --- a/contrib/plugins/uftrace_symbols.py
>>>> +++ b/contrib/plugins/uftrace_symbols.py
>>>> @@ -1,7 +1,7 @@
>>>> #!/usr/bin/env python3
>>>> # -*- coding: utf-8 -*-
>>>> #
>>>> -# Create symbols and mapping files for uftrace.
>>>> +# Create symbols, debug and mapping files for uftrace.
>>>> #
>>>> # Copyright 2025 Linaro Ltd
>>>> # Author: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> @@ -9,44 +9,71 @@
>>>> # SPDX-License-Identifier: GPL-2.0-or-later
>>>> import argparse
>>>> -import elftools # pip install pyelftools
>>>> import os
>>>> +import subprocess
>>>> -from elftools.elf.elffile import ELFFile
>>>> -from elftools.elf.sections import SymbolTableSection
>>>> +class Symbol:
>>>> + def __init__(self, name, addr, size):
>>>> + self.name = name
>>>> + # clamp addr to 48 bits, like uftrace entries
>>>> + self.addr = addr & 0xffffffffffff
>>>> + self.full_addr = addr
>>>> + self.size = size
>>>> -def elf_func_symbols(elf):
>>>> - symbol_tables = [(idx, s) for idx, s in enumerate(elf.iter_sections())
>>>> - if isinstance(s, SymbolTableSection)]
>>>> - symbols = []
>>>> - for _, section in symbol_tables:
>>>> - for _, symbol in enumerate(section.iter_symbols()):
>>>> - if symbol_size(symbol) == 0:
>>>> - continue
>>>> - type = symbol['st_info']['type']
>>>> - if type == 'STT_FUNC' or type == 'STT_NOTYPE':
>>>> - symbols.append(symbol)
>>>> - symbols.sort(key = lambda x: symbol_addr(x))
>>>> + def set_loc(self, file, line):
>>>> + self.file = file
>>>> + self.line = line
>>>> +
>>>> +def get_symbols(elf_file):
>>>> + symbols=[]
>>>> + try:
>>>> + out = subprocess.check_output(['nm', '--print-size', elf_file],
>>>> + stderr=subprocess.STDOUT,
>>>> + text=True)
>>> Nitpicking, we might be using cross-compiled `nm`, so maybe not
>>> hardcode
>>> the binary name.
>>>
>>>> + except subprocess.CalledProcessError as e:
>>>> + print(e.output)
>>>> + raise
>>>> + out = out.strip().split('\n')
>>>> + for line in out:
>>>> + info = line.split(' ')
>>>> + if len(info) == 3:
>>>> + # missing size information
>>>> + continue
>>>> + addr, size, type, name = info
>>>> + # add only symbols from .text section
>>>> + if type.lower() != 't':
>>>> + continue
>>>> + addr = int(addr, 16)
>>>> + size = int(size, 16)
>>>> + symbols.append(Symbol(name, addr, size))
>>>> + symbols.sort(key = lambda x: x.addr)
>>>> return symbols
>>>> -def symbol_size(symbol):
>>>> - return symbol['st_size']
>>>> -
>>>> -def symbol_addr(symbol):
>>>> - addr = symbol['st_value']
>>>> - # clamp addr to 48 bits, like uftrace entries
>>>> - return addr & 0xffffffffffff
>>>> -
>>>> -def symbol_name(symbol):
>>>> - return symbol.name
>>>> +def find_symbols_locations(elf_file, symbols):
>>>> + addresses = '\n'.join([hex(x.full_addr) for x in symbols])
>>>> + try:
>>>> + out = subprocess.check_output(['addr2line', '--exe', elf_file],
>>> Ditto (cross compiled)?
>>>
>>>> + stderr=subprocess.STDOUT,
>>>> + input=addresses, text=True)
>>>> + except subprocess.CalledProcessError as e:
>>>> + print(e.output)
>>>> + raise
>>>> + out = out.strip().split('\n')
>>>> + assert len(out) == len(symbols)
>>>> + for i in range(len(symbols)):
>>>> + s = symbols[i]
>>>> + file, line = out[i].split(':')
>>>> + # addr2line may return 'line (discriminator [0-9]+)' sometimes,
>>>> + # remove this to keep only line number.
>>>> + line = line.split(' ')[0]
>>>> + s.set_loc(file, line)
>>>> class BinaryFile:
>>>> def __init__(self, path, map_offset):
>>>> self.fullpath = os.path.realpath(path)
>>>> self.map_offset = map_offset
>>>> - with open(path, 'rb') as f:
>>>> - self.elf = ELFFile(f)
>>>> - self.symbols = elf_func_symbols(self.elf)
>>>> + self.symbols = get_symbols(self.fullpath)
>>>> + find_symbols_locations(self.fullpath, self.symbols)
>>>> def path(self):
>>>> return self.fullpath
>>>> @@ -56,7 +83,7 @@ def addr_start(self):
>>>> def addr_end(self):
>>>> last_sym = self.symbols[-1]
>>>> - return symbol_addr(last_sym) + symbol_size(last_sym) + self.map_offset
>>>> + return last_sym.addr + last_sym.size + self.map_offset
>>>> def generate_symbol_file(self, prefix_symbols):
>>>> binary_name = os.path.basename(self.fullpath)
>>>> @@ -66,14 +93,21 @@ def generate_symbol_file(self, prefix_symbols):
>>>> # print hexadecimal addresses on 48 bits
>>>> addrx = "0>12x"
>>>> for s in self.symbols:
>>>> - addr = symbol_addr(s)
>>>> + addr = s.addr
>>>> addr = f'{addr:{addrx}}'
>>>> - size = f'{symbol_size(s):{addrx}}'
>>>> - name = symbol_name(s)
>>>> + size = f'{s.size:{addrx}}'
>>>> if prefix_symbols:
>>>> - name = f'{binary_name}:{name}'
>>>> + name = f'{binary_name}:{s.name}'
>>>> print(addr, size, 'T', name, file=sym_file)
>>>> + def generate_debug_file(self):
>>>> + binary_name = os.path.basename(self.fullpath)
>>>> + dbg_file_path = f'./uftrace.data/{binary_name}.dbg'
>>> Prefer os.path.join().
>>>
>>>> + with open(dbg_file_path, 'w') as dbg_file:
>>>> + for s in self.symbols:
>>>> + print(f'F: {hex(s.addr)} {s.name}', file=dbg_file)
>>>> + print(f'L: {s.line} {s.file}', file=dbg_file)
>>>> +
>>>> def parse_parameter(p):
>>>> s = p.split(":")
>>>> path = s[0]
>>>> @@ -84,7 +118,7 @@ def parse_parameter(p):
>>>> offset = s[1]
>>>> if not offset.startswith('0x'):
>>>> err = f'offset "{offset}" is not an hexadecimal constant. '
>>>> - err += 'It should starts with "0x".'
>>>> + err += 'It should start with "0x".'
>>>> raise ValueError(err)
>>>> offset = int(offset, 16)
>>>> return path, offset
>>>> @@ -124,7 +158,8 @@ def generate_map(binaries):
>>>> def main():
>>>> parser = argparse.ArgumentParser(description=
>>>> - 'generate symbol files for uftrace')
>>>> + 'generate symbol files for uftrace. '
>>>> + 'Require binutils (nm and addr2line).')
>>>> parser.add_argument('elf_file', nargs='+',
>>>> help='path to an ELF file. '
>>>> 'Use /path/to/file:0xdeadbeef to add a mapping offset.')
>>>> @@ -145,6 +180,7 @@ def main():
>>>> for b in binaries:
>>>> b.generate_symbol_file(args.prefix_symbols)
>>>> + b.generate_debug_file()
>>>> generate_map(binaries)
>>>>
>>> No blocking comments:
>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>>>
>>
>> Thanks Philippe.
>>
>> For the cross compiled tools, I'm not really sure it's worth making
>> this more complex. Having tooling for a cross architecture is an
>> advanced setup, and I think it's fair to expect someone to have
>> binutils installed if they have any cross compiler and cross binutils.
>> Plus, they can always create a symlink if needed.
>>
>> Alex, can you eventually integrate the (other) cosmetic changes?
>> If not, I can send a new patch if you prefer.
>
> If you could send a v2 that would be great.
>
>>
>> Thanks,
>> Pierrick
>
v2 sent:
https://lore.kernel.org/qemu-devel/20251015232809.628043-1-pierrick.bouvier@linaro.org/T/#u
Thanks,
Pierrick
prev parent reply other threads:[~2025-10-15 23:31 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 21:39 [PATCH] contrib/plugins/uftrace_symbols.py: generate debug files to map symbols to source Pierrick Bouvier
2025-10-15 11:11 ` Alex Bennée
2025-10-15 17:41 ` Pierrick Bouvier
2025-10-15 20:40 ` Philippe Mathieu-Daudé
2025-10-15 21:02 ` Pierrick Bouvier
2025-10-15 21:12 ` Pierrick Bouvier
2025-10-16 5:45 ` Philippe Mathieu-Daudé
2025-10-15 21:23 ` Alex Bennée
2025-10-15 23:31 ` Pierrick Bouvier [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0167e505-e769-4305-87b4-f281219a3465@linaro.org \
--to=pierrick.bouvier@linaro.org \
--cc=alex.bennee@linaro.org \
--cc=erdnaxe@crans.org \
--cc=ma.mandourr@gmail.com \
--cc=manos.pitsidianakis@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).