bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eduard Zingerman <eddyz87@gmail.com>
To: Yonghong Song <yonghong.song@linux.dev>,
	Alan Maguire <alan.maguire@oracle.com>,
	Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	 dwarves@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	 bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com,  Song Liu <song@kernel.org>
Subject: Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching
Date: Mon, 11 Nov 2024 00:01:13 -0800	[thread overview]
Message-ID: <080794545d8eb3df3d6eba90ac621111ab7171f5.camel@gmail.com> (raw)
In-Reply-To: <31dea31e6f75916fdc078d433263daa6bb0bffdc.camel@gmail.com>

On Sun, 2024-11-10 at 03:38 -0800, Eduard Zingerman wrote:

[...]

> Also, it appears there is some bug either in pahole or in libdw's
> implementation of dwarf_getlocation(). When I try both your patch-set
> and my variant there is a segfault once in a while:
> 
>   $ for i in $(seq 1 100); \
>     do echo "---> $i"; \
>        pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/dev/null vmlinux ; \
>     done
>   ---> 1
>   ...
>   ---> 71
>   Segmentation fault (core dumped)
>   ...
> 
> The segfault happens only when -j (multiple threads) is passed.
> If pahole is built with sanitizers
> (passing -DCMAKE_C_FLAGS="-fsanitize=undefined,address")
> the stack trace looks as follows:

Did some additional research for these SEGFAULTs.
Looks like all we are in trouble.

# TLDR

libdw is not supposed to be used in a concurrent context.
libdw is a part of elfutils package, the configuration flag
making API thread-safe is documented as experimental:
  --enable-thread-safety  enable thread safety of libraries EXPERIMENTAL
At-least Fedora 40 does not ship elfutils built with this flag set.
This colours all current parallel DWARF decoding questionable.

# Why segfault happens

Any references to elfutils source code are for commit [1].
The dwarf_getlocation() is one of a few libdw APIs that uses memory
allocation internally. The function dwarf_getlocation.c:__libdw_intern_expression
iterates over expression encodings in DWARF and allocates
a set of objects of type `struct loclist` and `Dwarf_Op`.
Pointers to allocated objects are put to a binary tree for caching,
see dwarf_getlocation.c:660, the call to eu_tsearch() function.
The eu_tsearch() is a wrapper around libc tsearch() function.
This wrapper provides locking for the tree,
but only if --enable-thread-safety was set during elfutils configuration.
The SEGFAULT happens inside tsearch() call because binary tree is malformed, e.g.:

  Thread 8 "pahole" received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7fffd9c006c0 (LWP 2630074)]
  0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
  228	      if (parentp != NULL && RED(DEREFNODEPTR(parentp)))
  (gdb) bt
  #0  0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
  #1  0x00007ffff7c5d466 in __GI___tsearch (...) at tsearch.c:358
  #2  __GI___tsearch (...) at tsearch.c:290
  #3  0x000000000048f096 in __interceptor_tsearch ()
  #4  0x00007ffff7f5c482 in __libdw_intern_expression (...) at dwarf_getlocation.c:660
  #5  0x00007ffff7f5cf51 in getlocation (...) at dwarf_getlocation.c:678
  #6  getlocation (...) at dwarf_getlocation.c:667
  #7  dwarf_getlocation (..._ at dwarf_getlocation.c:708
  #8  0x00000000005a2ee5 in parameter.new ()
  #9  0x00000000005a0122 in die.process_function ()
  #10 0x0000000000597efd in __die__process_tag ()
  #11 0x0000000000595ad9 in die.process_unit ()
  #12 0x0000000000595436 in die.process ()
  #13 0x00000000005b0187 in dwarf_cus.process_cu ()
  #14 0x00000000005afa38 in dwarf_cus.process_cu_thread ()
  #15 0x00000000004c7b8d in asan_thread_start(void*) ()
  #16 0x00007ffff7bda6d7 in start_thread (arg=<optimized out>) at pthread_create.c:447
  #17 0x00007ffff7c5e60c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
  (gdb) p parentp
  $1 = (node *) 0x50300079d2a0
  (gdb) p *parentp
  $2 = (node) 0x0

glibc provides a way to validate binary tree structure.
For this misc/tsearch.c has to be changed to define DEBUGGING variable.
(I used glibc 2.39 as provided by source rpm for Fedora 40 for experiments).
If this is done and custom glibc is used for pahole execution,
the following error is reported if '-j' flag is present:

  $ pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/home/eddy/work/tmp/my-new.btf vmlinux 
  Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
  Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
  Aborted (core dumped)

Executing pahole using a custom-built libdw,
built with --enable-thread-safety resolves the issue.

[1] b2f225d6bff8 ("Consolidate and add files to clean target variables")
    git://sourceware.org/git/elfutils.git


  reply	other threads:[~2024-11-11  8:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 18:05 [PATCH dwarves 0/3] Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 1/3] dwarf_loader: Refactor function parameter__new() Yonghong Song
2024-11-11 11:26   ` Alan Maguire
2024-11-08 18:05 ` [PATCH dwarves 2/3] dwarf_loader: Refactor function check_dwarf_locations() Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-10 11:38   ` Eduard Zingerman
2024-11-11  8:01     ` Eduard Zingerman [this message]
2024-11-11 12:36       ` Jiri Olsa
2024-11-11 13:42         ` Arnaldo Carvalho de Melo
2024-11-15 16:26           ` elfutils thread-safety (Was: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching) Mark Wielaard
2024-11-11 18:51     ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-11  9:54   ` Jiri Olsa
2024-11-11 18:54     ` Yonghong Song
2024-11-11 15:39   ` Alan Maguire
2024-11-12  1:51     ` Yonghong Song
2024-11-12 16:56       ` Alan Maguire
2024-11-12 17:07         ` Yonghong Song
2024-11-12 18:33           ` Alan Maguire
2024-11-12 18:51             ` Yonghong Song
2024-11-12 19:10             ` Arnaldo Carvalho de Melo
2024-11-13 17:33               ` Alan Maguire
2024-11-13 18:27                 ` Yonghong Song
2024-11-14 12:16                   ` Alan Maguire
2024-11-14 16:29                     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=080794545d8eb3df3d6eba90ac621111ab7171f5.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=alan.maguire@oracle.com \
    --cc=andrii@kernel.org \
    --cc=arnaldo.melo@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dwarves@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).