From: Jiri Olsa <olsajiri@gmail.com>
To: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>,
Alan Maguire <alan.maguire@oracle.com>,
Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
dwarves@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Song Liu <song@kernel.org>
Subject: Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching
Date: Mon, 11 Nov 2024 13:36:52 +0100 [thread overview]
Message-ID: <ZzH6ZDucO2nm9Y52@krava> (raw)
In-Reply-To: <080794545d8eb3df3d6eba90ac621111ab7171f5.camel@gmail.com>
On Mon, Nov 11, 2024 at 12:01:13AM -0800, Eduard Zingerman wrote:
> On Sun, 2024-11-10 at 03:38 -0800, Eduard Zingerman wrote:
>
> [...]
>
> > Also, it appears there is some bug either in pahole or in libdw's
> > implementation of dwarf_getlocation(). When I try both your patch-set
> > and my variant there is a segfault once in a while:
> >
> > $ for i in $(seq 1 100); \
> > do echo "---> $i"; \
> > pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/dev/null vmlinux ; \
> > done
> > ---> 1
> > ...
> > ---> 71
> > Segmentation fault (core dumped)
> > ...
> >
> > The segfault happens only when -j (multiple threads) is passed.
> > If pahole is built with sanitizers
> > (passing -DCMAKE_C_FLAGS="-fsanitize=undefined,address")
> > the stack trace looks as follows:
>
> Did some additional research for these SEGFAULTs.
> Looks like all we are in trouble.
>
> # TLDR
>
> libdw is not supposed to be used in a concurrent context.
> libdw is a part of elfutils package, the configuration flag
> making API thread-safe is documented as experimental:
> --enable-thread-safety enable thread safety of libraries EXPERIMENTAL
> At-least Fedora 40 does not ship elfutils built with this flag set.
> This colours all current parallel DWARF decoding questionable.
>
> # Why segfault happens
>
> Any references to elfutils source code are for commit [1].
> The dwarf_getlocation() is one of a few libdw APIs that uses memory
> allocation internally. The function dwarf_getlocation.c:__libdw_intern_expression
> iterates over expression encodings in DWARF and allocates
> a set of objects of type `struct loclist` and `Dwarf_Op`.
> Pointers to allocated objects are put to a binary tree for caching,
> see dwarf_getlocation.c:660, the call to eu_tsearch() function.
> The eu_tsearch() is a wrapper around libc tsearch() function.
> This wrapper provides locking for the tree,
> but only if --enable-thread-safety was set during elfutils configuration.
> The SEGFAULT happens inside tsearch() call because binary tree is malformed, e.g.:
>
> Thread 8 "pahole" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd9c006c0 (LWP 2630074)]
> 0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
> 228 if (parentp != NULL && RED(DEREFNODEPTR(parentp)))
> (gdb) bt
> #0 0x00007ffff7c5d200 in maybe_split_for_insert (...) at tsearch.c:228
> #1 0x00007ffff7c5d466 in __GI___tsearch (...) at tsearch.c:358
> #2 __GI___tsearch (...) at tsearch.c:290
> #3 0x000000000048f096 in __interceptor_tsearch ()
> #4 0x00007ffff7f5c482 in __libdw_intern_expression (...) at dwarf_getlocation.c:660
> #5 0x00007ffff7f5cf51 in getlocation (...) at dwarf_getlocation.c:678
> #6 getlocation (...) at dwarf_getlocation.c:667
> #7 dwarf_getlocation (..._ at dwarf_getlocation.c:708
> #8 0x00000000005a2ee5 in parameter.new ()
> #9 0x00000000005a0122 in die.process_function ()
> #10 0x0000000000597efd in __die__process_tag ()
> #11 0x0000000000595ad9 in die.process_unit ()
> #12 0x0000000000595436 in die.process ()
> #13 0x00000000005b0187 in dwarf_cus.process_cu ()
> #14 0x00000000005afa38 in dwarf_cus.process_cu_thread ()
> #15 0x00000000004c7b8d in asan_thread_start(void*) ()
> #16 0x00007ffff7bda6d7 in start_thread (arg=<optimized out>) at pthread_create.c:447
> #17 0x00007ffff7c5e60c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
> (gdb) p parentp
> $1 = (node *) 0x50300079d2a0
> (gdb) p *parentp
> $2 = (node) 0x0
>
> glibc provides a way to validate binary tree structure.
> For this misc/tsearch.c has to be changed to define DEBUGGING variable.
> (I used glibc 2.39 as provided by source rpm for Fedora 40 for experiments).
> If this is done and custom glibc is used for pahole execution,
> the following error is reported if '-j' flag is present:
>
> $ pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/home/eddy/work/tmp/my-new.btf vmlinux
> Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
> Fatal glibc error: tsearch.c:164 (check_tree_recurse): assertion failed: d_sofar == d_total
> Aborted (core dumped)
>
> Executing pahole using a custom-built libdw,
> built with --enable-thread-safety resolves the issue.
could we use libdw__lock around that? but I guess we use it on other
places as well..
jirka
>
> [1] b2f225d6bff8 ("Consolidate and add files to clean target variables")
> git://sourceware.org/git/elfutils.git
>
>
next prev parent reply other threads:[~2024-11-11 12:36 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 18:05 [PATCH dwarves 0/3] Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 1/3] dwarf_loader: Refactor function parameter__new() Yonghong Song
2024-11-11 11:26 ` Alan Maguire
2024-11-08 18:05 ` [PATCH dwarves 2/3] dwarf_loader: Refactor function check_dwarf_locations() Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-10 11:38 ` Eduard Zingerman
2024-11-11 8:01 ` Eduard Zingerman
2024-11-11 12:36 ` Jiri Olsa [this message]
2024-11-11 13:42 ` Arnaldo Carvalho de Melo
2024-11-15 16:26 ` elfutils thread-safety (Was: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching) Mark Wielaard
2024-11-11 18:51 ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-11 9:54 ` Jiri Olsa
2024-11-11 18:54 ` Yonghong Song
2024-11-11 15:39 ` Alan Maguire
2024-11-12 1:51 ` Yonghong Song
2024-11-12 16:56 ` Alan Maguire
2024-11-12 17:07 ` Yonghong Song
2024-11-12 18:33 ` Alan Maguire
2024-11-12 18:51 ` Yonghong Song
2024-11-12 19:10 ` Arnaldo Carvalho de Melo
2024-11-13 17:33 ` Alan Maguire
2024-11-13 18:27 ` Yonghong Song
2024-11-14 12:16 ` Alan Maguire
2024-11-14 16:29 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZzH6ZDucO2nm9Y52@krava \
--to=olsajiri@gmail.com \
--cc=alan.maguire@oracle.com \
--cc=andrii@kernel.org \
--cc=arnaldo.melo@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dwarves@vger.kernel.org \
--cc=eddyz87@gmail.com \
--cc=kernel-team@fb.com \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox