From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Thomas Richter <tmricht@linux.ibm.com>,
Namhyung Kim <namhyung@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
svens@linux.ibm.com, gor@linux.ibm.com, sumanthk@linux.ibm.com,
hca@linux.ibm.com
Subject: Re: [PATCH] perf lock: Fix core dump in command perf lock contention
Date: Mon, 2 Jan 2023 12:33:35 -0300 [thread overview]
Message-ID: <Y7L5T2IHdovfLgWp@kernel.org> (raw)
In-Reply-To: <20221230102627.2410847-1-tmricht@linux.ibm.com>
Em Fri, Dec 30, 2022 at 11:26:27AM +0100, Thomas Richter escreveu:
> The test case perf lock contention dumps core on s390. Run the following
> commands:
> # ./perf lock record -- ./perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 2.799 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ]
> #
> # ./perf lock contention
> Segmentation fault (core dumped)
> #
>
> The function call stack is lengthy, here are the top 5 functions:
> # gdb ./perf core.24048
> GNU gdb (GDB) Fedora Linux 12.1-6.fc37
> Copyright (C) 2022 Free Software Foundation, Inc.
> Core was generated by `./perf lock contention'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,
> addr=1789230) at util/machine.c:3356
> 3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start);
>
> (gdb) where
> #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,\
> addr=1789230) at util/machine.c:3356
> #1 0x000000000109f244 in callchain_id (evsel=0x30313e0,\
> sample=0x3ffea4f77d0) at builtin-lock.c:957
> #2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290,\
> addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) \
> at builtin-lock.c:586
> #3 0x000000000109f4d0 in report_lock_contention_begin_event \
> (evsel=0x30313e0, sample=0x3ffea4f77d0)
> at builtin-lock.c:1004
> #4 0x00000000010a00ae in evsel__process_contention_begin \
> (evsel=0x30313e0, sample=0x3ffea4f77d0)
> at builtin-lock.c:1254
> #5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, \
> event=0x3ff85601ef8, sample=0x3ffea4f77d0,
> evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464
> sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) \
> at util/session.c:1523
> .....
>
> The issue is in function machine__is_lock_function() in file
> ./util/machine.c lines 3355:
> /* should not fail from here */
> sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end",
> &kmap);
> machine->sched.text_end = kmap->unmap_ip(kmap, sym->start)
>
> On s390 the symbol __sched_text_end is *NOT* in the symbol list and the
> resulting pointer sym is set to NULL. The sym->start is then a NULL pointer
> access and generates the core dump.
>
> The reason why __sched_text_end is not in the symbol list on s390 is
> simple:
> When the symbol list is created at perf start up with function calls
> dso__load
> +--> dso__load_vmlinux_path
> +--> dso__load_vmlinux
> +--> dso__load_sym
> +--> dso__load_sym_internal (reads kernel symbols)
> +--> symbols__fixup_end
> +--> symbols__fixup_duplicate
>
> The issue is in function symbols__fixup_duplicate(). It deletes all
> symbols with have the same address. On s390
> # nm -g ~/linux/vmlinux| fgrep c68390
> 0000000000c68390 T __cpuidle_text_start
> 0000000000c68390 T __sched_text_end
> #
> two symbols have identical addresses and __sched_text_end is considered
> duplicate (in ascending sort order) and removed from the symbol list.
> Therefore it is missing and an invalid pointer reference occurs.
> The code checks for symbol __sched_text_start and when it exists assumes
> symbol __sched_text_end is also in the symbol table. However this is
> not the case on s390.
>
> Same situation exists for symbol __lock_text_start:
> 0000000000c68770 T __cpuidle_text_end
> 0000000000c68770 T __lock_text_start
> This symbol is also removed from the symbol table but used in function
> machine__is_lock_function().
>
> To fix this and keep duplicate symbols in the symbol table, set
> symbol_conf.allow_aliases to true. This prevents the removal of duplicate
> symbols in function symbols__fixup_duplicate().
>
> Output After:
> # ./perf lock contention
> contended total wait max wait avg wait type caller
>
> 48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a
> 47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132
> 5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140
> 4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8
> #
>
> Fixes: cc2367eebb0c ("machine: Adopt is_lock_function() from builtin-lock.c")
Humm, is that really the cset that introduces the problem? It just moves
things around, the cset that introduced the is_lock_function() function,
that assumed that __sched_text_end was always available was:
commit 0d2997f750d1de394231bc22768dab94a5b5db2f
Author: Namhyung Kim <namhyung@kernel.org>
Date: Wed Jun 15 09:32:22 2022 -0700
perf lock: Look up callchain for the contended locks
---
Right? Namhyung? Can you spot any problem in enabling duplicates as a
fix?
- Arnaldo
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> ---
> tools/perf/builtin-lock.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index 718b82bfcdff..506c2fe42d52 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
> @@ -1670,6 +1670,7 @@ static int __cmd_report(bool display_info)
>
> /* for lock function check */
> symbol_conf.sort_by_name = true;
> + symbol_conf.allow_aliases = true;
> symbol__init(&session->header.env);
>
> if (!data.is_pipe) {
> @@ -1757,6 +1758,7 @@ static int __cmd_contention(int argc, const char **argv)
>
> /* for lock function check */
> symbol_conf.sort_by_name = true;
> + symbol_conf.allow_aliases = true;
> symbol__init(&session->header.env);
>
> if (use_bpf) {
> --
> 2.38.1
--
- Arnaldo
next prev parent reply other threads:[~2023-01-02 15:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-30 10:26 [PATCH] perf lock: Fix core dump in command perf lock contention Thomas Richter
2023-01-02 15:33 ` Arnaldo Carvalho de Melo [this message]
2023-01-03 16:45 ` Namhyung Kim
2023-01-03 18:50 ` Arnaldo Carvalho de Melo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7L5T2IHdovfLgWp@kernel.org \
--to=acme@kernel.org \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=namhyung@kernel.org \
--cc=sumanthk@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tmricht@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.