* [GIT PULL 00/15] perf/core inlining improvements
@ 2017-10-25 15:59 Arnaldo Carvalho de Melo
2017-10-25 15:59 ` [PATCH 01/15] perf report: Remove code to handle inline frames from browsers Arnaldo Carvalho de Melo
` (16 more replies)
0 siblings, 17 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 15:59 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
Andi Kleen, David Ahern, Jin Yao, Jiri Olsa, Milian Wolff,
Namhyung Kim, Peter Zijlstra, Ravi Bangoria,
Arnaldo Carvalho de Melo
Hi Ingo,
Please consider pulling, this is Milian's v7 plus some fixes
acked by Namhyung after some discussion among the three of us, I
probably need to pick some more patches that are related to this area,
but lets make some progress and merge this kit,
- Arnaldo
Test results at the end of this message, as usual.
The following changes since commit 9b7c85473cc2fa6fc4a7f87636ff2b69742b82b7:
Merge tag 'perf-core-for-mingo-4.15-20171023' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2017-10-24 10:53:04 +0200)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.15-20171025
for you to fetch changes up to d8a88dd243a170a226aba33e7c53704db2f82aa6:
perf util: Enable handling of inlined frames by default (2017-10-25 10:50:47 -0300)
----------------------------------------------------------------
perf/core inline improvements:
From Milian's cover letter: (Milian Wolff)
This series of patches completely reworks the way inline frames are
handled. Instead of querying for the inline nodes on-demand in the
individual tools, we now create proper callchain nodes for inlined
frames. The advantages this approach brings are numerous:
- Less duplicated code in the individual browser
- Aggregated cost for inlined frames for the --children top-down list
- Various bug fixes that arose from querying for a srcline/symbol based on
the IP of a sample, which will always point to the last inlined frame
instead of the corresponding non-inlined frame
- Overall much better support for visualizing cost for heavily-inlined C++
code, which simply was confusing and unreliable before
- srcline honors the global setting as to whether full paths or basenames
should be shown
- Caches for inlined frames and srcline information, which allow us to
enable inline frame handling by default
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
----------------------------------------------------------------
Milian Wolff (15):
perf report: Remove code to handle inline frames from browsers
perf callchain: Store srcline in callchain_cursor_node
perf callchain: Refactor inline_list to operate on symbols
perf callchain: Refactor inline_list to store srcline string directly
perf callchain: Create real callchain entries for inlined frames
perf report: Fall-back to function name comparison for -g srcline
perf callchain: Mark inlined frames in output by " (inlined)" suffix
perf script: Mark inlined frames and do not print DSO for them
perf callchain: Compare symbol name for inlined frames when matching
perf report: Compare symbol name for inlined frames when sorting
perf report: Properly handle branch count in match_chain()
perf report: Cache failed lookups of inlined frames
perf report: Cache srclines for callchain nodes
perf report: Use srcline from callchain for hist entries
perf util: Enable handling of inlined frames by default
tools/perf/Documentation/perf-report.txt | 3 +-
tools/perf/Documentation/perf-script.txt | 3 +-
tools/perf/ui/browsers/hists.c | 180 ++-------------------
tools/perf/ui/stdio/hist.c | 77 +--------
tools/perf/util/callchain.c | 174 +++++++++++---------
tools/perf/util/callchain.h | 6 +-
tools/perf/util/dso.c | 7 +
tools/perf/util/dso.h | 2 +
tools/perf/util/event.c | 1 +
tools/perf/util/evsel_fprintf.c | 37 +----
tools/perf/util/hist.c | 7 +-
tools/perf/util/machine.c | 65 +++++++-
tools/perf/util/sort.c | 6 +
tools/perf/util/sort.h | 1 -
tools/perf/util/srcline.c | 268 +++++++++++++++++++++++++------
tools/perf/util/srcline.h | 26 ++-
tools/perf/util/symbol.c | 1 +
tools/perf/util/symbol.h | 2 +
18 files changed, 443 insertions(+), 423 deletions(-)
Test results:
The first ones are container (docker) based builds of tools/perf with and
without libelf support. Where clang is available, it is also used to build
perf with/without libelf.
The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.
Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.
The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.
Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.
[root@seventh fedora]# cd ..
[root@seventh linux-perf-tools-build]# time dm
1 alpine:3.4: Ok
2 alpine:3.5: Ok
3 alpine:3.6: Ok
4 alpine:edge: Ok
5 android-ndk:r12b-arm: Ok
6 android-ndk:r15c-arm: Ok
7 archlinux:latest: Ok
8 centos:5: Ok
9 centos:6: Ok
10 centos:7: Ok
11 debian:7: Ok
12 debian:8: Ok
13 debian:9: Ok
14 debian:experimental: Ok
15 debian:experimental-x-arm64: Ok
16 debian:experimental-x-mips: Ok
17 debian:experimental-x-mips64: Ok
18 debian:experimental-x-mipsel: Ok
19 fedora:20: Ok
20 fedora:21: Ok
21 fedora:22: Ok
22 fedora:23: Ok
23 fedora:24: Ok
24 fedora:24-x-ARC-uClibc: Ok
25 fedora:25: Ok
26 fedora:26: Ok
27 fedora:rawhide: Ok
28 mageia:5: Ok
29 mageia:6: Ok
30 opensuse:42.1: Ok
31 opensuse:42.2: Ok
32 opensuse:42.3: Ok
33 opensuse:tumbleweed: Ok
34 oraclelinux:6: Ok
35 ubuntu:12.04.5: Ok
36 ubuntu:14.04.4: Ok
37 ubuntu:14.04.4-x-linaro-arm64: Ok
38 ubuntu:15.04: Ok
39 ubuntu:16.04: Ok
40 ubuntu:16.04-x-arm: Ok
41 ubuntu:16.04-x-arm64: Ok
42 ubuntu:16.04-x-powerpc: Ok
43 ubuntu:16.04-x-powerpc64: Ok
44 ubuntu:16.04-x-powerpc64el: Ok
45 ubuntu:16.04-x-s390: Ok
46 ubuntu:16.10: Ok
47 ubuntu:17.04: Ok
48 ubuntu:17.10: Ok
#
# uname -a
Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
# perf test
1: vmlinux symtab matches kallsyms : Ok
2: Detect openat syscall event : Ok
3: Detect openat syscall event on all cpus : Ok
4: Read samples using the mmap interface : Ok
5: Test data source output : Ok
6: Parse event definition strings : Ok
7: Simple expression parser : Ok
8: PERF_RECORD_* events & perf_sample fields : Ok
9: Parse perf pmu format : Ok
10: DSO data read : Ok
11: DSO data cache : Ok
12: DSO data reopen : Ok
13: Roundtrip evsel->name : Ok
14: Parse sched tracepoints fields : Ok
15: syscalls:sys_enter_openat event fields : Ok
16: Setup struct perf_event_attr : Ok
17: Match and link multiple hists : Ok
18: 'import perf' in python : Ok
19: Breakpoint overflow signal handler : Ok
20: Breakpoint overflow sampling : Ok
21: Number of exit events of a simple workload : Ok
22: Software clock events period values : Ok
23: Object code reading : Ok
24: Sample parsing : Ok
25: Use a dummy software event to keep tracking : Ok
26: Parse with no sample_id_all bit set : Ok
27: Filter hist entries : Ok
28: Lookup mmap thread : Ok
29: Share thread mg : Ok
30: Sort output of hist entries : Ok
31: Cumulate child hist entries : Ok
32: Track with sched_switch : Ok
33: Filter fds with revents mask in a fdarray : Ok
34: Add fd to a fdarray, making it autogrow : Ok
35: kmod_path__parse : Ok
36: Thread map : Ok
37: LLVM search and compile :
37.1: Basic BPF llvm compile : Ok
37.2: kbuild searching : Ok
37.3: Compile source for BPF prologue generation : Ok
37.4: Compile source for BPF relocation : Ok
38: Session topology : Ok
39: BPF filter :
39.1: Basic BPF filtering : Ok
39.2: BPF pinning : Ok
39.3: BPF prologue generation : Ok
39.4: BPF relocation checker : Ok
40: Synthesize thread map : Ok
41: Remove thread map : Ok
42: Synthesize cpu map : Ok
43: Synthesize stat config : Ok
44: Synthesize stat : Ok
45: Synthesize stat round : Ok
46: Synthesize attr update : Ok
47: Event times : Ok
48: Read backward ring buffer : Ok
49: Print cpu map : Ok
50: Probe SDT events : Ok
51: is_printable_array : Ok
52: Print bitmap : Ok
53: perf hooks : Ok
54: builtin clang support : Skip (not compiled in)
55: unit_number__scnprintf : Ok
56: x86 rdpmc : Ok
57: Convert perf time to TSC : Ok
58: DWARF unwind : Ok
59: x86 instruction decoder - new instructions : Ok
60: Use vfs_getname probe to get syscall args filenames : Ok
61: probe libc's inet_pton & backtrace it with ping : Ok
62: Check open filename arg using perf trace + vfs_getname: Ok
63: Add vfs_getname probe to get syscall args filenames : Ok
#
$ make -C tools/perf build-test
make: Entering directory '/home/acme/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make_tags_O: make tags
make_install_prefix_slash_O: make install prefix=/tmp/krava/
make_no_libbionic_O: make NO_LIBBIONIC=1
make_no_libelf_O: make NO_LIBELF=1
make_no_slang_O: make NO_SLANG=1
make_no_backtrace_O: make NO_BACKTRACE=1
make_no_libpython_O: make NO_LIBPYTHON=1
make_perf_o_O: make perf.o
make_with_babeltrace_O: make LIBBABELTRACE=1
make_no_demangle_O: make NO_DEMANGLE=1
make_no_libnuma_O: make NO_LIBNUMA=1
make_no_gtk2_O: make NO_GTK2=1
make_no_libperl_O: make NO_LIBPERL=1
make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
make_util_map_o_O: make util/map.o
make_no_libaudit_O: make NO_LIBAUDIT=1
make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
make_doc_O: make doc
make_install_prefix_O: make install prefix=/tmp/krava
make_no_libunwind_O: make NO_LIBUNWIND=1
make_static_O: make LDFLAGS=-static
make_install_bin_O: make install-bin
make_pure_O: make
make_no_libbpf_O: make NO_LIBBPF=1
make_help_O: make help
make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
make_clean_all_O: make clean all
make_with_clangllvm_O: make LIBCLANGLLVM=1
make_no_newt_O: make NO_NEWT=1
make_util_pmu_bison_o_O: make util/pmu-bison.o
make_debug_O: make DEBUG=1
make_no_auxtrace_O: make NO_AUXTRACE=1
make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
make_install_O: make install
OK
make: Leaving directory '/home/acme/git/linux/tools/perf'
$
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 01/15] perf report: Remove code to handle inline frames from browsers
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
@ 2017-10-25 15:59 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 02/15] perf callchain: Store srcline in callchain_cursor_node Arnaldo Carvalho de Melo
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 15:59 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
The follow-up commits will make inline frames first-class citizens in
the callchain, thereby obsoleting all of this special code.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/ui/browsers/hists.c | 180 +++-------------------------------------
tools/perf/ui/stdio/hist.c | 77 +----------------
tools/perf/util/evsel_fprintf.c | 32 -------
tools/perf/util/hist.c | 5 --
tools/perf/util/sort.h | 1 -
5 files changed, 13 insertions(+), 282 deletions(-)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 13dfb0a0bdeb..3a433f370e7f 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -154,57 +154,9 @@ static void callchain_list__set_folding(struct callchain_list *cl, bool unfold)
cl->unfolded = unfold ? cl->has_children : false;
}
-static struct inline_node *inline_node__create(struct map *map, u64 ip)
-{
- struct dso *dso;
- struct inline_node *node;
-
- if (map == NULL)
- return NULL;
-
- dso = map->dso;
- if (dso == NULL)
- return NULL;
-
- node = dso__parse_addr_inlines(dso,
- map__rip_2objdump(map, ip));
-
- return node;
-}
-
-static int inline__count_rows(struct inline_node *node)
-{
- struct inline_list *ilist;
- int i = 0;
-
- if (node == NULL)
- return 0;
-
- list_for_each_entry(ilist, &node->val, list) {
- if ((ilist->filename != NULL) || (ilist->funcname != NULL))
- i++;
- }
-
- return i;
-}
-
-static int callchain_list__inline_rows(struct callchain_list *chain)
-{
- struct inline_node *node;
- int rows;
-
- node = inline_node__create(chain->ms.map, chain->ip);
- if (node == NULL)
- return 0;
-
- rows = inline__count_rows(node);
- inline_node__delete(node);
- return rows;
-}
-
static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
{
- int n = 0, inline_rows;
+ int n = 0;
struct rb_node *nd;
for (nd = rb_first(&node->rb_root); nd; nd = rb_next(nd)) {
@@ -215,12 +167,6 @@ static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
list_for_each_entry(chain, &child->val, list) {
++n;
- if (symbol_conf.inline_name) {
- inline_rows =
- callchain_list__inline_rows(chain);
- n += inline_rows;
- }
-
/* We need this because we may not have children */
folded_sign = callchain_list__folded(chain);
if (folded_sign == '+')
@@ -272,7 +218,7 @@ static int callchain_node__count_rows(struct callchain_node *node)
{
struct callchain_list *chain;
bool unfolded = false;
- int n = 0, inline_rows;
+ int n = 0;
if (callchain_param.mode == CHAIN_FLAT)
return callchain_node__count_flat_rows(node);
@@ -281,10 +227,6 @@ static int callchain_node__count_rows(struct callchain_node *node)
list_for_each_entry(chain, &node->val, list) {
++n;
- if (symbol_conf.inline_name) {
- inline_rows = callchain_list__inline_rows(chain);
- n += inline_rows;
- }
unfolded = chain->unfolded;
}
@@ -432,19 +374,6 @@ static void hist_entry__init_have_children(struct hist_entry *he)
he->init_have_children = true;
}
-static void hist_entry_init_inline_node(struct hist_entry *he)
-{
- if (he->inline_node)
- return;
-
- he->inline_node = inline_node__create(he->ms.map, he->ip);
-
- if (he->inline_node == NULL)
- return;
-
- he->has_children = true;
-}
-
static bool hist_browser__toggle_fold(struct hist_browser *browser)
{
struct hist_entry *he = browser->he_selection;
@@ -476,12 +405,8 @@ static bool hist_browser__toggle_fold(struct hist_browser *browser)
if (he->unfolded) {
if (he->leaf)
- if (he->inline_node)
- he->nr_rows = inline__count_rows(
- he->inline_node);
- else
- he->nr_rows = callchain__count_rows(
- &he->sorted_chain);
+ he->nr_rows = callchain__count_rows(
+ &he->sorted_chain);
else
he->nr_rows = hierarchy_count_rows(browser, he, false);
@@ -841,71 +766,6 @@ static bool hist_browser__check_dump_full(struct hist_browser *browser __maybe_u
#define LEVEL_OFFSET_STEP 3
-static int hist_browser__show_inline(struct hist_browser *browser,
- struct inline_node *node,
- unsigned short row,
- int offset)
-{
- struct inline_list *ilist;
- char buf[1024];
- int color, width, first_row;
-
- first_row = row;
- width = browser->b.width - (LEVEL_OFFSET_STEP + 2);
- list_for_each_entry(ilist, &node->val, list) {
- if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
- color = HE_COLORSET_NORMAL;
- if (ui_browser__is_current_entry(&browser->b, row))
- color = HE_COLORSET_SELECTED;
-
- if (callchain_param.key == CCKEY_ADDRESS ||
- callchain_param.key == CCKEY_SRCLINE) {
- if (ilist->filename != NULL)
- scnprintf(buf, sizeof(buf),
- "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- scnprintf(buf, sizeof(buf), "??");
- } else if (ilist->funcname != NULL)
- scnprintf(buf, sizeof(buf), "%s (inline)",
- ilist->funcname);
- else if (ilist->filename != NULL)
- scnprintf(buf, sizeof(buf),
- "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- scnprintf(buf, sizeof(buf), "??");
-
- ui_browser__set_color(&browser->b, color);
- hist_browser__gotorc(browser, row, 0);
- ui_browser__write_nstring(&browser->b, " ",
- LEVEL_OFFSET_STEP + offset);
- ui_browser__write_nstring(&browser->b, buf, width);
- row++;
- }
- }
-
- return row - first_row;
-}
-
-static size_t show_inline_list(struct hist_browser *browser, struct map *map,
- u64 ip, int row, int offset)
-{
- struct inline_node *node;
- int ret;
-
- node = inline_node__create(map, ip);
- if (node == NULL)
- return 0;
-
- ret = hist_browser__show_inline(browser, node, row, offset);
-
- inline_node__delete(node);
- return ret;
-}
-
static int hist_browser__show_callchain_list(struct hist_browser *browser,
struct callchain_node *node,
struct callchain_list *chain,
@@ -917,7 +777,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
char bf[1024], *alloc_str;
char buf[64], *alloc_str2;
const char *str;
- int inline_rows = 0, ret = 1;
+ int ret = 1;
if (arg->row_offset != 0) {
arg->row_offset--;
@@ -954,12 +814,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
free(alloc_str);
free(alloc_str2);
- if (symbol_conf.inline_name) {
- inline_rows = show_inline_list(browser, chain->ms.map,
- chain->ip, row + 1, offset);
- }
-
- return ret + inline_rows;
+ return ret;
}
static bool check_percent_display(struct rb_node *node, u64 parent_total)
@@ -1383,12 +1238,6 @@ static int hist_browser__show_entry(struct hist_browser *browser,
folded_sign = hist_entry__folded(entry);
}
- if (symbol_conf.inline_name &&
- (!entry->has_children)) {
- hist_entry_init_inline_node(entry);
- folded_sign = hist_entry__folded(entry);
- }
-
if (row_offset == 0) {
struct hpp_arg arg = {
.b = &browser->b,
@@ -1420,8 +1269,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
}
if (first) {
- if (symbol_conf.use_callchain ||
- symbol_conf.inline_name) {
+ if (symbol_conf.use_callchain) {
ui_browser__printf(&browser->b, "%c ", folded_sign);
width -= 2;
}
@@ -1463,15 +1311,11 @@ static int hist_browser__show_entry(struct hist_browser *browser,
.is_current_entry = current_entry,
};
- if (entry->inline_node)
- printed += hist_browser__show_inline(browser,
- entry->inline_node, row, 0);
- else
- printed += hist_browser__show_callchain(browser,
- entry, 1, row,
- hist_browser__show_callchain_entry,
- &arg,
- hist_browser__check_output_full);
+ printed += hist_browser__show_callchain(browser,
+ entry, 1, row,
+ hist_browser__show_callchain_entry,
+ &arg,
+ hist_browser__check_output_full);
}
return printed;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 8bdb7a500181..b6b9baac0e3b 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -21,64 +21,6 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int left_margin)
return ret;
}
-static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
- int depth, int depth_mask, FILE *fp)
-{
- struct dso *dso;
- struct inline_node *node;
- struct inline_list *ilist;
- int ret = 0, i;
-
- if (map == NULL)
- return 0;
-
- dso = map->dso;
- if (dso == NULL)
- return 0;
-
- node = dso__parse_addr_inlines(dso,
- map__rip_2objdump(map, ip));
- if (node == NULL)
- return 0;
-
- list_for_each_entry(ilist, &node->val, list) {
- if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
- ret += callchain__fprintf_left_margin(fp, left_margin);
-
- for (i = 0; i < depth; i++) {
- if (depth_mask & (1 << i))
- ret += fprintf(fp, "|");
- else
- ret += fprintf(fp, " ");
- ret += fprintf(fp, " ");
- }
-
- if (callchain_param.key == CCKEY_ADDRESS ||
- callchain_param.key == CCKEY_SRCLINE) {
- if (ilist->filename != NULL)
- ret += fprintf(fp, "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- ret += fprintf(fp, "??");
- } else if (ilist->funcname != NULL)
- ret += fprintf(fp, "%s (inline)",
- ilist->funcname);
- else if (ilist->filename != NULL)
- ret += fprintf(fp, "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- ret += fprintf(fp, "??");
-
- ret += fprintf(fp, "\n");
- }
- }
-
- inline_node__delete(node);
- return ret;
-}
-
static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
int left_margin)
{
@@ -137,9 +79,6 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
fputc('\n', fp);
free(alloc_str);
- if (symbol_conf.inline_name)
- ret += inline__fprintf(chain->ms.map, chain->ip,
- left_margin, depth, depth_mask, fp);
return ret;
}
@@ -314,13 +253,6 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
if (++entries_printed == callchain_param.print_limit)
break;
-
- if (symbol_conf.inline_name)
- ret += inline__fprintf(chain->ms.map,
- chain->ip,
- left_margin,
- 0, 0,
- fp);
}
root = &cnode->rb_root;
}
@@ -600,7 +532,6 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
{
int ret;
int callchain_ret = 0;
- int inline_ret = 0;
struct perf_hpp hpp = {
.buf = bf,
.size = size,
@@ -622,13 +553,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
callchain_ret = hist_entry_callchain__fprintf(he, total_period,
0, fp);
- if (callchain_ret == 0 && symbol_conf.inline_name) {
- inline_ret = inline__fprintf(he->ms.map, he->ip, 0, 0, 0, fp);
- ret += inline_ret;
- if (inline_ret > 0)
- ret += fprintf(fp, "\n");
- } else
- ret += callchain_ret;
+ ret += callchain_ret;
return ret;
}
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 583f3a602506..f2c6c5ee11e8 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -169,38 +169,6 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
if (!print_oneline)
printed += fprintf(fp, "\n");
- if (symbol_conf.inline_name && node->map) {
- struct inline_node *inode;
-
- addr = map__rip_2objdump(node->map, node->ip),
- inode = dso__parse_addr_inlines(node->map->dso, addr);
-
- if (inode) {
- struct inline_list *ilist;
-
- list_for_each_entry(ilist, &inode->val, list) {
- if (print_arrow)
- printed += fprintf(fp, " <-");
-
- /* IP is same, just skip it */
- if (print_ip)
- printed += fprintf(fp, "%c%16s",
- s, "");
- if (print_sym)
- printed += fprintf(fp, " %s",
- ilist->funcname);
- if (print_srcline)
- printed += fprintf(fp, "\n %s:%d",
- ilist->filename,
- ilist->line_nr);
- if (!print_oneline)
- printed += fprintf(fp, "\n");
- }
-
- inline_node__delete(inode);
- }
- }
-
if (symbol_conf.bt_stop_list &&
node->sym &&
strlist__has_entry(symbol_conf.bt_stop_list,
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e60d8d8ea4c2..b0fa9c217e1c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1141,11 +1141,6 @@ void hist_entry__delete(struct hist_entry *he)
zfree(&he->mem_info);
}
- if (he->inline_node) {
- inline_node__delete(he->inline_node);
- he->inline_node = NULL;
- }
-
zfree(&he->stat_acc);
free_srcline(he->srcline);
if (he->srcfile && he->srcfile[0])
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index f36dc4980a6c..507d096aee4e 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -129,7 +129,6 @@ struct hist_entry {
};
char *srcline;
char *srcfile;
- struct inline_node *inline_node;
struct symbol *parent;
struct branch_info *branch_info;
struct hists *hists;
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 02/15] perf callchain: Store srcline in callchain_cursor_node
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
2017-10-25 15:59 ` [PATCH 01/15] perf report: Remove code to handle inline frames from browsers Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 03/15] perf callchain: Refactor inline_list to operate on symbols Arnaldo Carvalho de Melo
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
This is mostly a preparation to enable the creation of full callchain
nodes for inline frames. Such frames will reference the IP of the
non-inlined frame, but hold the symbol and srcline for an inlined
location. As such, we won't be able to query the srcline on-demand based
on the IP alone. Instead, we will leverage the functionality provided by
this patch here, and store the srcline for the inlined nodes in the new
srcline member of callchain_cursor_node.
Note that this patch on its own leaks the srcline, as there is no
free_callchain_cursor_node or similar. A future patch will add caching
of the srcline and handle deletion properly.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 31 +++++++++----------------------
tools/perf/util/callchain.h | 6 ++++--
tools/perf/util/machine.c | 18 ++++++++++++++++--
3 files changed, 29 insertions(+), 26 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index a971caf3759d..e7ee794d1e5b 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -566,6 +566,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call->ip = cursor_node->ip;
call->ms.sym = cursor_node->sym;
call->ms.map = map__get(cursor_node->map);
+ call->srcline = cursor_node->srcline;
if (cursor_node->branch) {
call->branch_count = 1;
@@ -647,20 +648,11 @@ enum match_result {
static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
struct callchain_list *cnode)
{
- char *left = NULL;
- char *right = NULL;
+ const char *left = cnode->srcline;
+ const char *right = node->srcline;
enum match_result ret = MATCH_EQ;
int cmp;
- if (cnode->ms.map)
- left = get_srcline(cnode->ms.map->dso,
- map__rip_2objdump(cnode->ms.map, cnode->ip),
- cnode->ms.sym, true, false);
- if (node->map)
- right = get_srcline(node->map->dso,
- map__rip_2objdump(node->map, node->ip),
- node->sym, true, false);
-
if (left && right)
cmp = strcmp(left, right);
else if (!left && right)
@@ -675,8 +667,6 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
if (cmp != 0)
ret = cmp < 0 ? MATCH_LT : MATCH_GT;
- free_srcline(left);
- free_srcline(right);
return ret;
}
@@ -969,7 +959,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
list_for_each_entry_safe(list, next_list, &src->val, list) {
callchain_cursor_append(cursor, list->ip,
list->ms.map, list->ms.sym,
- false, NULL, 0, 0, 0);
+ false, NULL, 0, 0, 0, list->srcline);
list_del(&list->list);
map__zput(list->ms.map);
free(list);
@@ -1009,7 +999,8 @@ int callchain_merge(struct callchain_cursor *cursor,
int callchain_cursor_append(struct callchain_cursor *cursor,
u64 ip, struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, u64 iter_cycles, u64 branch_from)
+ int nr_loop_iter, u64 iter_cycles, u64 branch_from,
+ const char *srcline)
{
struct callchain_cursor_node *node = *cursor->last;
@@ -1028,6 +1019,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
node->branch = branch;
node->nr_loop_iter = nr_loop_iter;
node->iter_cycles = iter_cycles;
+ node->srcline = srcline;
if (flags)
memcpy(&node->branch_flags, flags,
@@ -1115,12 +1107,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
int printed;
if (cl->ms.sym) {
- if (show_srcline && cl->ms.map && !cl->srcline)
- cl->srcline = get_srcline(cl->ms.map->dso,
- map__rip_2objdump(cl->ms.map,
- cl->ip),
- cl->ms.sym, false, show_addr);
- if (cl->srcline)
+ if (show_srcline && cl->srcline)
printed = scnprintf(bf, bfsize, "%s %s",
cl->ms.sym->name, cl->srcline);
else
@@ -1532,7 +1519,7 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
node->branch, &node->branch_flags,
node->nr_loop_iter,
node->iter_cycles,
- node->branch_from);
+ node->branch_from, node->srcline);
if (rc)
break;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 1ed6fc61d0a5..8f67b681cde9 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -121,7 +121,7 @@ struct callchain_list {
u64 iter_count;
u64 iter_cycles;
struct branch_type_stat brtype_stat;
- char *srcline;
+ const char *srcline;
struct list_head list;
};
@@ -135,6 +135,7 @@ struct callchain_cursor_node {
u64 ip;
struct map *map;
struct symbol *sym;
+ const char *srcline;
bool branch;
struct branch_flags branch_flags;
u64 branch_from;
@@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, u64 iter_cycles, u64 branch_from);
+ int nr_loop_iter, u64 iter_cycles, u64 branch_from,
+ const char *srcline);
/* Close a cursor writing session. Initialize for the reader */
static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7c3aa479201a..a37e1c056415 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1709,6 +1709,15 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
return mi;
}
+static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
+{
+ if (!map || callchain_param.key == CCKEY_FUNCTION)
+ return NULL;
+
+ return get_srcline(map->dso, map__rip_2objdump(map, ip),
+ sym, false, callchain_param.key == CCKEY_ADDRESS);
+}
+
struct iterations {
int nr_loop_iter;
u64 cycles;
@@ -1728,6 +1737,7 @@ static int add_callchain_ip(struct thread *thread,
struct addr_location al;
int nr_loop_iter = 0;
u64 iter_cycles = 0;
+ const char *srcline = NULL;
al.filtered = 0;
al.sym = NULL;
@@ -1783,9 +1793,10 @@ static int add_callchain_ip(struct thread *thread,
iter_cycles = iter->cycles;
}
+ srcline = callchain_srcline(al.map, al.sym, al.addr);
return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
branch, flags, nr_loop_iter,
- iter_cycles, branch_from);
+ iter_cycles, branch_from, srcline);
}
struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -2101,12 +2112,15 @@ static int thread__resolve_callchain_sample(struct thread *thread,
static int unwind_entry(struct unwind_entry *entry, void *arg)
{
struct callchain_cursor *cursor = arg;
+ const char *srcline = NULL;
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
+
+ srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
- false, NULL, 0, 0, 0);
+ false, NULL, 0, 0, 0, srcline);
}
static int thread__resolve_callchain_unwind(struct thread *thread,
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 03/15] perf callchain: Refactor inline_list to operate on symbols
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
2017-10-25 15:59 ` [PATCH 01/15] perf report: Remove code to handle inline frames from browsers Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 02/15] perf callchain: Store srcline in callchain_cursor_node Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 04/15] perf callchain: Refactor inline_list to store srcline string directly Arnaldo Carvalho de Melo
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
This is a requirement to create real callchain entries for inlined
frames.
Since the list of inlines usually contains the target symbol too, i.e.
the location where the frames get inlined to, we alias that symbol and
reuse it as-is is. This ensures that other dependent functionality keeps
working, most notably annotation of the target frames.
For all other entries in the inline_list, a fake symbol is created.
These are marked by new 'inlined' member which is set to true. Only
those symbols are managed by the inline_list and get freed when the
inline_list is deleted from within inline_node__delete.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/srcline.c | 93 ++++++++++++++++++++++++++++++++---------------
tools/perf/util/srcline.h | 7 +++-
tools/perf/util/symbol.h | 1 +
3 files changed, 69 insertions(+), 32 deletions(-)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ed8e8d2de942..c0af61b355ed 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,29 +33,19 @@ static const char *dso__name(struct dso *dso)
return dso_name;
}
-static int inline_list__append(char *filename, char *funcname, int line_nr,
- struct inline_node *node, struct dso *dso)
+static int inline_list__append(struct symbol *symbol, char *filename,
+ int line_nr, struct inline_node *node)
{
struct inline_list *ilist;
- char *demangled;
ilist = zalloc(sizeof(*ilist));
if (ilist == NULL)
return -1;
+ ilist->symbol = symbol;
ilist->filename = filename;
ilist->line_nr = line_nr;
- if (dso != NULL) {
- demangled = dso__demangle_sym(dso, 0, funcname);
- if (demangled == NULL) {
- ilist->funcname = funcname;
- } else {
- ilist->funcname = demangled;
- free(funcname);
- }
- }
-
if (callchain_param.order == ORDER_CALLEE)
list_add_tail(&ilist->list, &node->val);
else
@@ -206,19 +196,56 @@ static void addr2line_cleanup(struct a2l_data *a2l)
#define MAX_INLINE_NEST 1024
+static struct symbol *new_inline_sym(struct dso *dso,
+ struct symbol *base_sym,
+ const char *funcname)
+{
+ struct symbol *inline_sym;
+ char *demangled = NULL;
+
+ if (dso) {
+ demangled = dso__demangle_sym(dso, 0, funcname);
+ if (demangled)
+ funcname = demangled;
+ }
+
+ if (base_sym && strcmp(funcname, base_sym->name) == 0) {
+ /* reuse the real, existing symbol */
+ inline_sym = base_sym;
+ /* ensure that we don't alias an inlined symbol, which could
+ * lead to double frees in inline_node__delete
+ */
+ assert(!base_sym->inlined);
+ } else {
+ /* create a fake symbol for the inline frame */
+ inline_sym = symbol__new(base_sym ? base_sym->start : 0,
+ base_sym ? base_sym->end : 0,
+ base_sym ? base_sym->binding : 0,
+ funcname);
+ if (inline_sym)
+ inline_sym->inlined = 1;
+ }
+
+ free(demangled);
+
+ return inline_sym;
+}
+
static int inline_list__append_dso_a2l(struct dso *dso,
- struct inline_node *node)
+ struct inline_node *node,
+ struct symbol *sym)
{
struct a2l_data *a2l = dso->a2l;
- char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
- char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
+ struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
- return inline_list__append(filename, funcname, a2l->line, node, dso);
+ return inline_list__append(inline_sym, strdup(a2l->filename),
+ a2l->line, node);
}
static int addr2line(const char *dso_name, u64 addr,
char **file, unsigned int *line, struct dso *dso,
- bool unwind_inlines, struct inline_node *node)
+ bool unwind_inlines, struct inline_node *node,
+ struct symbol *sym)
{
int ret = 0;
struct a2l_data *a2l = dso->a2l;
@@ -244,7 +271,7 @@ static int addr2line(const char *dso_name, u64 addr,
if (unwind_inlines) {
int cnt = 0;
- if (node && inline_list__append_dso_a2l(dso, node))
+ if (node && inline_list__append_dso_a2l(dso, node, sym))
return 0;
while (bfd_find_inliner_info(a2l->abfd, &a2l->filename,
@@ -255,7 +282,7 @@ static int addr2line(const char *dso_name, u64 addr,
a2l->filename = NULL;
if (node != NULL) {
- if (inline_list__append_dso_a2l(dso, node))
+ if (inline_list__append_dso_a2l(dso, node, sym))
return 0;
// found at least one inline frame
ret = 1;
@@ -287,7 +314,7 @@ void dso__free_a2l(struct dso *dso)
}
static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
- struct dso *dso)
+ struct dso *dso, struct symbol *sym)
{
struct inline_node *node;
@@ -300,7 +327,7 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
INIT_LIST_HEAD(&node->val);
node->addr = addr;
- if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node))
+ if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
goto out_free_inline_node;
if (list_empty(&node->val))
@@ -340,7 +367,8 @@ static int addr2line(const char *dso_name, u64 addr,
char **file, unsigned int *line_nr,
struct dso *dso __maybe_unused,
bool unwind_inlines __maybe_unused,
- struct inline_node *node __maybe_unused)
+ struct inline_node *node __maybe_unused,
+ struct symbol *sym __maybe_unused)
{
FILE *fp;
char cmd[PATH_MAX];
@@ -380,7 +408,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
}
static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
- struct dso *dso __maybe_unused)
+ struct dso *dso __maybe_unused,
+ struct symbol *sym)
{
FILE *fp;
char cmd[PATH_MAX];
@@ -408,13 +437,13 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
node->addr = addr;
while (getline(&filename, &len, fp) != -1) {
+
if (filename_split(filename, &line_nr) != 1) {
free(filename);
goto out;
}
- if (inline_list__append(filename, NULL, line_nr, node,
- NULL) != 0)
+ if (inline_list__append(sym, filename, line_nr, node) != 0)
goto out;
filename = NULL;
@@ -454,7 +483,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
if (dso_name == NULL)
goto out;
- if (!addr2line(dso_name, addr, &file, &line, dso, unwind_inlines, NULL))
+ if (!addr2line(dso_name, addr, &file, &line, dso,
+ unwind_inlines, NULL, sym))
goto out;
if (asprintf(&srcline, "%s:%u",
@@ -500,7 +530,8 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
}
-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+ struct symbol *sym)
{
const char *dso_name;
@@ -508,7 +539,7 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
if (dso_name == NULL)
return NULL;
- return addr2inlines(dso_name, addr, dso);
+ return addr2inlines(dso_name, addr, dso, sym);
}
void inline_node__delete(struct inline_node *node)
@@ -518,7 +549,9 @@ void inline_node__delete(struct inline_node *node)
list_for_each_entry_safe(ilist, tmp, &node->val, list) {
list_del_init(&ilist->list);
zfree(&ilist->filename);
- zfree(&ilist->funcname);
+ /* only the inlined symbols are owned by the list */
+ if (ilist->symbol && ilist->symbol->inlined)
+ symbol__delete(ilist->symbol);
free(ilist);
}
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 7b52ba88676e..730bfd6926d7 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -17,8 +17,8 @@ void free_srcline(char *srcline);
#define SRCLINE_UNKNOWN ((char *) "??:0")
struct inline_list {
+ struct symbol *symbol;
char *filename;
- char *funcname;
unsigned int line_nr;
struct list_head list;
};
@@ -28,7 +28,10 @@ struct inline_node {
struct list_head val;
};
-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr);
+/* parse inlined frames for the given address */
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+ struct symbol *sym);
+/* free resources associated to the inline node list */
void inline_node__delete(struct inline_node *node);
#endif /* PERF_SRCLINE_H */
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index aad99e7e179b..d880a059babb 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -59,6 +59,7 @@ struct symbol {
u8 binding;
u8 idle:1;
u8 ignore:1;
+ u8 inlined:1;
u8 arch_sym;
char name[0];
};
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 04/15] perf callchain: Refactor inline_list to store srcline string directly
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (2 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 03/15] perf callchain: Refactor inline_list to operate on symbols Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 05/15] perf callchain: Create real callchain entries for inlined frames Arnaldo Carvalho de Melo
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
This is a preparation for the creation of real callchain entries for
inlined frames. The rest of the perf code uses the srcline string. As
such, using that also for the srcline API allows us to simplify some of
the upcoming code. Most notably, it will allow us to cache the srcline
for a given inline node and reuse it for different callchain entries.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/srcline.c | 54 +++++++++++++++++++++++++++++++++++------------
tools/perf/util/srcline.h | 3 +--
2 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index c0af61b355ed..f202fc7827df 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,8 +33,8 @@ static const char *dso__name(struct dso *dso)
return dso_name;
}
-static int inline_list__append(struct symbol *symbol, char *filename,
- int line_nr, struct inline_node *node)
+static int inline_list__append(struct symbol *symbol, char *srcline,
+ struct inline_node *node)
{
struct inline_list *ilist;
@@ -43,8 +43,7 @@ static int inline_list__append(struct symbol *symbol, char *filename,
return -1;
ilist->symbol = symbol;
- ilist->filename = filename;
- ilist->line_nr = line_nr;
+ ilist->srcline = srcline;
if (callchain_param.order == ORDER_CALLEE)
list_add_tail(&ilist->list, &node->val);
@@ -54,6 +53,30 @@ static int inline_list__append(struct symbol *symbol, char *filename,
return 0;
}
+/* basename version that takes a const input string */
+static const char *gnu_basename(const char *path)
+{
+ const char *base = strrchr(path, '/');
+
+ return base ? base + 1 : path;
+}
+
+static char *srcline_from_fileline(const char *file, unsigned int line)
+{
+ char *srcline;
+
+ if (!file)
+ return NULL;
+
+ if (!srcline_full_filename)
+ file = gnu_basename(file);
+
+ if (asprintf(&srcline, "%s:%u", file, line) < 0)
+ return NULL;
+
+ return srcline;
+}
+
#ifdef HAVE_LIBBFD_SUPPORT
/*
@@ -237,9 +260,12 @@ static int inline_list__append_dso_a2l(struct dso *dso,
{
struct a2l_data *a2l = dso->a2l;
struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
+ char *srcline = NULL;
- return inline_list__append(inline_sym, strdup(a2l->filename),
- a2l->line, node);
+ if (a2l->filename)
+ srcline = srcline_from_fileline(a2l->filename, a2l->line);
+
+ return inline_list__append(inline_sym, srcline, node);
}
static int addr2line(const char *dso_name, u64 addr,
@@ -437,13 +463,15 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
node->addr = addr;
while (getline(&filename, &len, fp) != -1) {
+ char *srcline;
if (filename_split(filename, &line_nr) != 1) {
free(filename);
goto out;
}
- if (inline_list__append(sym, filename, line_nr, node) != 0)
+ srcline = srcline_from_fileline(filename, line_nr);
+ if (inline_list__append(sym, srcline, node) != 0)
goto out;
filename = NULL;
@@ -487,16 +515,14 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
unwind_inlines, NULL, sym))
goto out;
- if (asprintf(&srcline, "%s:%u",
- srcline_full_filename ? file : basename(file),
- line) < 0) {
- free(file);
+ srcline = srcline_from_fileline(file, line);
+ free(file);
+
+ if (!srcline)
goto out;
- }
dso->a2l_fails = 0;
- free(file);
return srcline;
out:
@@ -548,7 +574,7 @@ void inline_node__delete(struct inline_node *node)
list_for_each_entry_safe(ilist, tmp, &node->val, list) {
list_del_init(&ilist->list);
- zfree(&ilist->filename);
+ free_srcline(ilist->srcline);
/* only the inlined symbols are owned by the list */
if (ilist->symbol && ilist->symbol->inlined)
symbol__delete(ilist->symbol);
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 730bfd6926d7..0201ed2c0b9c 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -18,8 +18,7 @@ void free_srcline(char *srcline);
struct inline_list {
struct symbol *symbol;
- char *filename;
- unsigned int line_nr;
+ char *srcline;
struct list_head list;
};
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 05/15] perf callchain: Create real callchain entries for inlined frames
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (3 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 04/15] perf callchain: Refactor inline_list to store srcline string directly Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 06/15] perf report: Fall-back to function name comparison for -g srcline Arnaldo Carvalho de Melo
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
The inline_node structs are maintained by the new dso->inlines tree.
This in turn keeps ownership of the fake symbols and srcline string
representing an inline frame.
This tree is sorted by address to allow quick lookups. All other entries
of the symbol beside the function name are unused for inline frames. The
advantage of this approach is that all existing users of the callchain
API can now transparently display inlined frames without having to patch
their code.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/dso.c | 5 +++++
tools/perf/util/dso.h | 1 +
tools/perf/util/machine.c | 37 ++++++++++++++++++++++++++++++++++
tools/perf/util/srcline.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/srcline.h | 9 +++++++++
5 files changed, 103 insertions(+)
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 339e52971380..75c8250b3b8a 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -10,6 +10,7 @@
#include "compress.h"
#include "path.h"
#include "symbol.h"
+#include "srcline.h"
#include "dso.h"
#include "machine.h"
#include "auxtrace.h"
@@ -1201,6 +1202,7 @@ struct dso *dso__new(const char *name)
for (i = 0; i < MAP__NR_TYPES; ++i)
dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
dso->data.cache = RB_ROOT;
+ dso->inlined_nodes = RB_ROOT;
dso->data.fd = -1;
dso->data.status = DSO_DATA_STATUS_UNKNOWN;
dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1232,6 +1234,9 @@ void dso__delete(struct dso *dso)
if (!RB_EMPTY_NODE(&dso->rb_node))
pr_err("DSO %s is still in rbtree when being deleted!\n",
dso->long_name);
+
+ /* free inlines first, as they reference symbols */
+ inlines__tree_delete(&dso->inlined_nodes);
for (i = 0; i < MAP__NR_TYPES; ++i)
symbols__delete(&dso->symbols[i]);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index a2bbb21f301c..122eca0d242d 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -141,6 +141,7 @@ struct dso {
struct rb_root *root; /* root of rbtree that rb_node is in */
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
+ struct rb_root inlined_nodes;
struct {
u64 addr;
struct symbol *symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a37e1c056415..3d049cb313ac 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2109,6 +2109,40 @@ static int thread__resolve_callchain_sample(struct thread *thread,
return 0;
}
+static int append_inlines(struct callchain_cursor *cursor,
+ struct map *map, struct symbol *sym, u64 ip)
+{
+ struct inline_node *inline_node;
+ struct inline_list *ilist;
+ u64 addr;
+
+ if (!symbol_conf.inline_name || !map || !sym)
+ return 1;
+
+ addr = map__rip_2objdump(map, ip);
+
+ inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
+ if (!inline_node) {
+ inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
+ if (!inline_node)
+ return 1;
+
+ inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
+ }
+
+ list_for_each_entry(ilist, &inline_node->val, list) {
+ int ret = callchain_cursor_append(cursor, ip, map,
+ ilist->symbol, false,
+ NULL, 0, 0, 0,
+ ilist->srcline);
+
+ if (ret != 0)
+ return ret;
+ }
+
+ return 0;
+}
+
static int unwind_entry(struct unwind_entry *entry, void *arg)
{
struct callchain_cursor *cursor = arg;
@@ -2117,6 +2151,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
+ if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
+ return 0;
+
srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index f202fc7827df..8bea6621d657 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -583,3 +583,54 @@ void inline_node__delete(struct inline_node *node)
free(node);
}
+
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
+{
+ struct rb_node **p = &tree->rb_node;
+ struct rb_node *parent = NULL;
+ const u64 addr = inlines->addr;
+ struct inline_node *i;
+
+ while (*p != NULL) {
+ parent = *p;
+ i = rb_entry(parent, struct inline_node, rb_node);
+ if (addr < i->addr)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ rb_link_node(&inlines->rb_node, parent, p);
+ rb_insert_color(&inlines->rb_node, tree);
+}
+
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
+{
+ struct rb_node *n = tree->rb_node;
+
+ while (n) {
+ struct inline_node *i = rb_entry(n, struct inline_node,
+ rb_node);
+
+ if (addr < i->addr)
+ n = n->rb_left;
+ else if (addr > i->addr)
+ n = n->rb_right;
+ else
+ return i;
+ }
+
+ return NULL;
+}
+
+void inlines__tree_delete(struct rb_root *tree)
+{
+ struct inline_node *pos;
+ struct rb_node *next = rb_first(tree);
+
+ while (next) {
+ pos = rb_entry(next, struct inline_node, rb_node);
+ next = rb_next(&pos->rb_node);
+ rb_erase(&pos->rb_node, tree);
+ inline_node__delete(pos);
+ }
+}
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 0201ed2c0b9c..ebe38cd22294 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -2,6 +2,7 @@
#define PERF_SRCLINE_H
#include <linux/list.h>
+#include <linux/rbtree.h>
#include <linux/types.h>
struct dso;
@@ -25,6 +26,7 @@ struct inline_list {
struct inline_node {
u64 addr;
struct list_head val;
+ struct rb_node rb_node;
};
/* parse inlined frames for the given address */
@@ -33,4 +35,11 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
/* free resources associated to the inline node list */
void inline_node__delete(struct inline_node *node);
+/* insert the inline node list into the DSO, which will take ownership */
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
+/* find previously inserted inline node list */
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
+/* delete all nodes within the tree of inline_node s */
+void inlines__tree_delete(struct rb_root *tree);
+
#endif /* PERF_SRCLINE_H */
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 06/15] perf report: Fall-back to function name comparison for -g srcline
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (4 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 05/15] perf callchain: Create real callchain entries for inlined frames Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 07/15] perf callchain: Mark inlined frames in output by " (inlined)" suffix Arnaldo Carvalho de Melo
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
When a callchain entry has no srcline available, we ended up comparing
the instruction pointer. I consider this to be not too useful. Rather, I
think we should group the entries by function name, which this patch
adds. For people who want to split the data on the IP boundary, using
`-g address` is the correct choice.
Before:
~~~~~
100.00% 38.86% [.] main
|
|--61.14%--main inlining.cpp:14
| std::norm<double> complex:664
| std::_Norm_helper<true>::_S_do_it<double> complex:654
| std::abs<double> complex:597
| std::__complex_abs complex:589
| |
| |--56.03%--hypot
| | |
| | |--8.45%--__hypot_finite
| | |
| | |--7.62%--__hypot_finite
| | |
| | |--2.29%--__hypot_finite
| | |
| | |--2.24%--__hypot_finite
| | |
| | |--2.06%--__hypot_finite
| | |
| | |--1.81%--__hypot_finite
...
~~~~~
After:
~~~~~
100.00% 38.86% [.] main
|
|--61.14%--main inlining.cpp:14
| std::norm<double> complex:664
| std::_Norm_helper<true>::_S_do_it<double> complex:654
| std::abs<double> complex:597
| std::__complex_abs complex:589
| |
| |--60.29%--hypot
| | |
| | --56.03%--__hypot_finite
| |
| --0.85%--cabs
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e7ee794d1e5b..0f2ba493a7a3 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -645,11 +645,9 @@ enum match_result {
MATCH_GT,
};
-static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
- struct callchain_list *cnode)
+static enum match_result match_chain_strings(const char *left,
+ const char *right)
{
- const char *left = cnode->srcline;
- const char *right = node->srcline;
enum match_result ret = MATCH_EQ;
int cmp;
@@ -659,10 +657,8 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
cmp = 1;
else if (left && !right)
cmp = -1;
- else if (cnode->ip == node->ip)
- cmp = 0;
else
- cmp = (cnode->ip < node->ip) ? -1 : 1;
+ return MATCH_ERROR;
if (cmp != 0)
ret = cmp < 0 ? MATCH_LT : MATCH_GT;
@@ -679,10 +675,18 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
struct dso *right_dso = NULL;
if (callchain_param.key == CCKEY_SRCLINE) {
- enum match_result match = match_chain_srcline(node, cnode);
+ enum match_result match = match_chain_strings(cnode->srcline,
+ node->srcline);
+
+ /* if no srcline is available, fallback to symbol name */
+ if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
+ match = match_chain_strings(cnode->ms.sym->name,
+ node->sym->name);
if (match != MATCH_ERROR)
return match;
+
+ /* otherwise fall-back to IP-based comparison below */
}
if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 07/15] perf callchain: Mark inlined frames in output by " (inlined)" suffix
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (5 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 06/15] perf report: Fall-back to function name comparison for -g srcline Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 08/15] perf script: Mark inlined frames and do not print DSO for them Arnaldo Carvalho de Melo
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
The original patch that introduced inline frame output in the various
browsers used this suffix already. The new centralized approach that
uses fake symbols for inlined frames was missing this approach so far.
Instead of changing the symbol name itself, we only print the suffix
where needed. This allows us to efficiently lookup the symbol for a
given name without first having to append the suffix before the lookup.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 10 +++++++---
tools/perf/util/sort.c | 3 +++
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0f2ba493a7a3..77031efdca5c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1111,11 +1111,15 @@ char *callchain_list__sym_name(struct callchain_list *cl,
int printed;
if (cl->ms.sym) {
+ const char *inlined = cl->ms.sym->inlined ? " (inlined)" : "";
+
if (show_srcline && cl->srcline)
- printed = scnprintf(bf, bfsize, "%s %s",
- cl->ms.sym->name, cl->srcline);
+ printed = scnprintf(bf, bfsize, "%s %s%s",
+ cl->ms.sym->name, cl->srcline,
+ inlined);
else
- printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+ printed = scnprintf(bf, bfsize, "%s%s",
+ cl->ms.sym->name, inlined);
} else
printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index eb3ab902a1c0..acb9210fd18a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -283,6 +283,9 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
ret += repsep_snprintf(bf + ret, size - ret, "%.*s",
width - ret,
sym->name);
+ if (sym->inlined)
+ ret += repsep_snprintf(bf + ret, size - ret,
+ " (inlined)");
}
} else {
size_t len = BITS_PER_LONG / 4;
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 08/15] perf script: Mark inlined frames and do not print DSO for them
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (6 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 07/15] perf callchain: Mark inlined frames in output by " (inlined)" suffix Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 09/15] perf callchain: Compare symbol name for inlined frames when matching Arnaldo Carvalho de Melo
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
Instead of showing the (repeated) DSO name of the non-inlined frame, we
now show the "(inlined)" suffix instead.
Before:
214f7 __hypot_finite (/usr/lib/libm-2.25.so)
ace3 hypot (/usr/lib/libm-2.25.so)
a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
a4a main (/home/milian/projects/src/perf-tests/inlining)
20510 __libc_start_main (/usr/lib/libc-2.25.so)
bd9 _start (/home/milian/projects/src/perf-tests/inlining)
After:
214f7 __hypot_finite (/usr/lib/libm-2.25.so)
ace3 hypot (/usr/lib/libm-2.25.so)
a4a std::__complex_abs (inlined)
a4a std::abs<double> (inlined)
a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
a4a std::norm<double> (inlined)
a4a main (/home/milian/projects/src/perf-tests/inlining)
20510 __libc_start_main (/usr/lib/libc-2.25.so)
bd9 _start (/home/milian/projects/src/perf-tests/inlining)
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/evsel_fprintf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index f2c6c5ee11e8..5b9e89257aa7 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -157,7 +157,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
}
}
- if (print_dso) {
+ if (print_dso && (!node->sym || !node->sym->inlined)) {
printed += fprintf(fp, " (");
printed += map__fprintf_dsoname(node->map, fp);
printed += fprintf(fp, ")");
@@ -166,6 +166,9 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
if (print_srcline)
printed += map__fprintf_srcline(node->map, addr, "\n ", fp);
+ if (node->sym && node->sym->inlined)
+ printed += fprintf(fp, " (inlined)");
+
if (!print_oneline)
printed += fprintf(fp, "\n");
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 09/15] perf callchain: Compare symbol name for inlined frames when matching
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (7 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 08/15] perf script: Mark inlined frames and do not print DSO for them Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 10/15] perf report: Compare symbol name for inlined frames when sorting Arnaldo Carvalho de Melo
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Ravi Bangoria, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
The fake symbols we create for inlined frames will represent different
functions but can use the symbol start address. This leads to issues
when different inline branches all lead to the same function.
Before:
~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
--38.86%--_start
__libc_start_main
main
|
--37.57%--std::norm<double> (inlined)
std::_Norm_helper<true>::_S_do_it<double> (inlined)
|
--36.36%--std::abs<double> (inlined)
std::__complex_abs (inlined)
|
--12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
~~~~~
Note that this backtrace representation is completely bogus.
Complex abs does not call the linear congruential engine! It
is just a side-effect of a longer inlined stack being appended
to a shorter, different inlined stack, both of which originate
in the same function (main).
This patch fixes the issue:
~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
--38.86%--_start
__libc_start_main
main
|
|--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
| std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
| |
| --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
| std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
| |
| --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
| std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
| std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
|
--1.99%--std::norm<double> (inlined)
std::_Norm_helper<true>::_S_do_it<double> (inlined)
std::abs<double> (inlined)
std::__complex_abs (inlined)
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
[ Fix up conflict with c1fbc0cf81f1 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 77031efdca5c..35a920f09503 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -690,6 +690,14 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
}
if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
+ /*
+ * Compare inlined frames based on their symbol name because
+ * different inlined frames will have the same symbol start
+ */
+ if (cnode->ms.sym->inlined || node->sym->inlined)
+ return match_chain_strings(cnode->ms.sym->name,
+ node->sym->name);
+
left = cnode->ms.sym->start;
right = sym->start;
left_dso = cnode->ms.map->dso;
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 10/15] perf report: Compare symbol name for inlined frames when sorting
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (8 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 09/15] perf callchain: Compare symbol name for inlined frames when matching Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 11/15] perf report: Properly handle branch count in match_chain() Arnaldo Carvalho de Melo
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Peter Zijlstra, Yao Jin, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
Similar to the callstack frame matching, we also have to compare the
symbol name when sorting hist entries. The reason is twofold: On one
hand, multiple inlined functions will use the same symbol start/end
values of the parent, non-inlined symbol.
As such, all of these symbols often end up missing from top-level
report, as they get merged with the non-inlined frame. On the other
hand, multiple different functions may end up inlining the same
function, and we need to aggregate these values properly.
Before:
~~~~~
perf report --stdio --inline -g none
# Children Self Command Shared Object Symbol
# ........ ........ ............ ............. ...................................
#
100.00% 39.69% cpp-inlining cpp-inlining [.] main
100.00% 0.00% cpp-inlining cpp-inlining [.] _start
100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main
97.03% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined)
59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot
55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite
0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs
~~~~~
After:
~~~~~
perf report --stdio --inline -g none
# Children Self Command Shared Object Symbol
# ........ ........ ............ ............. ...................................................................................................................................
#
100.00% 39.69% cpp-inlining cpp-inlining [.] main
100.00% 0.00% cpp-inlining cpp-inlining [.] _start
100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main
62.57% 0.00% cpp-inlining cpp-inlining [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
62.57% 0.00% cpp-inlining cpp-inlining [.] std::__complex_abs (inlined)
62.57% 0.00% cpp-inlining cpp-inlining [.] std::abs<double> (inlined)
62.57% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined)
59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot
55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite
34.46% 0.00% cpp-inlining cpp-inlining [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
32.39% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
32.39% 0.00% cpp-inlining cpp-inlining [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
12.29% 0.00% cpp-inlining cpp-inlining [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/sort.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index acb9210fd18a..006d10a0dc96 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -225,6 +225,9 @@ static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
if (sym_l == sym_r)
return 0;
+ if (sym_l->inlined || sym_r->inlined)
+ return strcmp(sym_l->name, sym_r->name);
+
if (sym_l->start != sym_r->start)
return (int64_t)(sym_r->start - sym_l->start);
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 11/15] perf report: Properly handle branch count in match_chain()
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (9 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 10/15] perf report: Compare symbol name for inlined frames when sorting Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 12/15] perf report: Cache failed lookups of inlined frames Arnaldo Carvalho de Melo
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Jin Yao, Peter Zijlstra, Ravi Bangoria, Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
Some of the code paths I introduced before returned too early without
running the code to handle a node's branch count. By refactoring
match_chain to only have one exit point, this can be remedied.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 140 ++++++++++++++++++++++++--------------------
1 file changed, 78 insertions(+), 62 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35a920f09503..19bfcadcf891 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -666,83 +666,99 @@ static enum match_result match_chain_strings(const char *left,
return ret;
}
-static enum match_result match_chain(struct callchain_cursor_node *node,
- struct callchain_list *cnode)
+/*
+ * We need to always use relative addresses because we're aggregating
+ * callchains from multiple threads, i.e. different address spaces, so
+ * comparing absolute addresses make no sense as a symbol in a DSO may end up
+ * in a different address when used in a different binary or even the same
+ * binary but with some sort of address randomization technique, thus we need
+ * to compare just relative addresses. -acme
+ */
+static enum match_result match_chain_dso_addresses(struct map *left_map, u64 left_ip,
+ struct map *right_map, u64 right_ip)
{
- struct symbol *sym = node->sym;
- u64 left, right;
- struct dso *left_dso = NULL;
- struct dso *right_dso = NULL;
+ struct dso *left_dso = left_map ? left_map->dso : NULL;
+ struct dso *right_dso = right_map ? right_map->dso : NULL;
- if (callchain_param.key == CCKEY_SRCLINE) {
- enum match_result match = match_chain_strings(cnode->srcline,
- node->srcline);
+ if (left_dso != right_dso)
+ return left_dso < right_dso ? MATCH_LT : MATCH_GT;
- /* if no srcline is available, fallback to symbol name */
- if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
- match = match_chain_strings(cnode->ms.sym->name,
- node->sym->name);
+ if (left_ip != right_ip)
+ return left_ip < right_ip ? MATCH_LT : MATCH_GT;
- if (match != MATCH_ERROR)
- return match;
+ return MATCH_EQ;
+}
- /* otherwise fall-back to IP-based comparison below */
- }
+static enum match_result match_chain(struct callchain_cursor_node *node,
+ struct callchain_list *cnode)
+{
+ enum match_result match = MATCH_ERROR;
- if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
- /*
- * Compare inlined frames based on their symbol name because
- * different inlined frames will have the same symbol start
- */
- if (cnode->ms.sym->inlined || node->sym->inlined)
- return match_chain_strings(cnode->ms.sym->name,
- node->sym->name);
-
- left = cnode->ms.sym->start;
- right = sym->start;
- left_dso = cnode->ms.map->dso;
- right_dso = node->map->dso;
- } else {
- left = cnode->ip;
- right = node->ip;
+ switch (callchain_param.key) {
+ case CCKEY_SRCLINE:
+ match = match_chain_strings(cnode->srcline, node->srcline);
+ if (match != MATCH_ERROR)
+ break;
+ /* otherwise fall-back to symbol-based comparison below */
+ __fallthrough;
+ case CCKEY_FUNCTION:
+ if (node->sym && cnode->ms.sym) {
+ /*
+ * Compare inlined frames based on their symbol name
+ * because different inlined frames will have the same
+ * symbol start. Otherwise do a faster comparison based
+ * on the symbol start address.
+ */
+ if (cnode->ms.sym->inlined || node->sym->inlined) {
+ match = match_chain_strings(cnode->ms.sym->name,
+ node->sym->name);
+ if (match != MATCH_ERROR)
+ break;
+ } else {
+ match = match_chain_dso_addresses(cnode->ms.map, cnode->ms.sym->start,
+ node->map, node->sym->start);
+ break;
+ }
+ }
+ /* otherwise fall-back to IP-based comparison below */
+ __fallthrough;
+ case CCKEY_ADDRESS:
+ default:
+ match = match_chain_dso_addresses(cnode->ms.map, cnode->ip, node->map, node->ip);
+ break;
}
- if (left == right && left_dso == right_dso) {
- if (node->branch) {
- cnode->branch_count++;
+ if (match == MATCH_EQ && node->branch) {
+ cnode->branch_count++;
- if (node->branch_from) {
- /*
- * It's "to" of a branch
- */
- cnode->brtype_stat.branch_to = true;
+ if (node->branch_from) {
+ /*
+ * It's "to" of a branch
+ */
+ cnode->brtype_stat.branch_to = true;
- if (node->branch_flags.predicted)
- cnode->predicted_count++;
+ if (node->branch_flags.predicted)
+ cnode->predicted_count++;
- if (node->branch_flags.abort)
- cnode->abort_count++;
+ if (node->branch_flags.abort)
+ cnode->abort_count++;
- branch_type_count(&cnode->brtype_stat,
- &node->branch_flags,
- node->branch_from,
- node->ip);
- } else {
- /*
- * It's "from" of a branch
- */
- cnode->brtype_stat.branch_to = false;
- cnode->cycles_count +=
- node->branch_flags.cycles;
- cnode->iter_count += node->nr_loop_iter;
- cnode->iter_cycles += node->iter_cycles;
- }
+ branch_type_count(&cnode->brtype_stat,
+ &node->branch_flags,
+ node->branch_from,
+ node->ip);
+ } else {
+ /*
+ * It's "from" of a branch
+ */
+ cnode->brtype_stat.branch_to = false;
+ cnode->cycles_count += node->branch_flags.cycles;
+ cnode->iter_count += node->nr_loop_iter;
+ cnode->iter_cycles += node->iter_cycles;
}
-
- return MATCH_EQ;
}
- return left > right ? MATCH_GT : MATCH_LT;
+ return match;
}
/*
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 12/15] perf report: Cache failed lookups of inlined frames
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (10 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 11/15] perf report: Properly handle branch count in match_chain() Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 13/15] perf report: Cache srclines for callchain nodes Arnaldo Carvalho de Melo
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Jin Yao, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/machine.c | 15 +++++++--------
tools/perf/util/srcline.c | 16 +---------------
2 files changed, 8 insertions(+), 23 deletions(-)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3d049cb313ac..177c1d4088f8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2115,9 +2115,10 @@ static int append_inlines(struct callchain_cursor *cursor,
struct inline_node *inline_node;
struct inline_list *ilist;
u64 addr;
+ int ret = 1;
if (!symbol_conf.inline_name || !map || !sym)
- return 1;
+ return ret;
addr = map__rip_2objdump(map, ip);
@@ -2125,22 +2126,20 @@ static int append_inlines(struct callchain_cursor *cursor,
if (!inline_node) {
inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
if (!inline_node)
- return 1;
-
+ return ret;
inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
}
list_for_each_entry(ilist, &inline_node->val, list) {
- int ret = callchain_cursor_append(cursor, ip, map,
- ilist->symbol, false,
- NULL, 0, 0, 0,
- ilist->srcline);
+ ret = callchain_cursor_append(cursor, ip, map,
+ ilist->symbol, false,
+ NULL, 0, 0, 0, ilist->srcline);
if (ret != 0)
return ret;
}
- return 0;
+ return ret;
}
static int unwind_entry(struct unwind_entry *entry, void *arg)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 8bea6621d657..fc3888664b20 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -353,17 +353,8 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
INIT_LIST_HEAD(&node->val);
node->addr = addr;
- if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
- goto out_free_inline_node;
-
- if (list_empty(&node->val))
- goto out_free_inline_node;
-
+ addr2line(dso_name, addr, NULL, NULL, dso, true, node, sym);
return node;
-
-out_free_inline_node:
- inline_node__delete(node);
- return NULL;
}
#else /* HAVE_LIBBFD_SUPPORT */
@@ -480,11 +471,6 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
out:
pclose(fp);
- if (list_empty(&node->val)) {
- inline_node__delete(node);
- return NULL;
- }
-
return node;
}
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 13/15] perf report: Cache srclines for callchain nodes
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (11 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 12/15] perf report: Cache failed lookups of inlined frames Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 14/15] perf report: Use srcline from callchain for hist entries Arnaldo Carvalho de Melo
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Jin Yao, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/dso.c | 2 ++
tools/perf/util/dso.h | 1 +
tools/perf/util/machine.c | 17 +++++++++---
tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/srcline.h | 7 +++++
5 files changed, 90 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 75c8250b3b8a..3192b608e91b 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1203,6 +1203,7 @@ struct dso *dso__new(const char *name)
dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
dso->data.cache = RB_ROOT;
dso->inlined_nodes = RB_ROOT;
+ dso->srclines = RB_ROOT;
dso->data.fd = -1;
dso->data.status = DSO_DATA_STATUS_UNKNOWN;
dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1237,6 +1238,7 @@ void dso__delete(struct dso *dso)
/* free inlines first, as they reference symbols */
inlines__tree_delete(&dso->inlined_nodes);
+ srcline__tree_delete(&dso->srclines);
for (i = 0; i < MAP__NR_TYPES; ++i)
symbols__delete(&dso->symbols[i]);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 122eca0d242d..821b16c67030 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
struct rb_root inlined_nodes;
+ struct rb_root srclines;
struct {
u64 addr;
struct symbol *symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 177c1d4088f8..94d8f1ccedd9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1711,11 +1711,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
{
+ char *srcline = NULL;
+
if (!map || callchain_param.key == CCKEY_FUNCTION)
- return NULL;
+ return srcline;
+
+ srcline = srcline__tree_find(&map->dso->srclines, ip);
+ if (!srcline) {
+ bool show_sym = false;
+ bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+ srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+ sym, show_sym, show_addr);
+ srcline__tree_insert(&map->dso->srclines, ip, srcline);
+ }
- return get_srcline(map->dso, map__rip_2objdump(map, ip),
- sym, false, callchain_param.key == CCKEY_ADDRESS);
+ return srcline;
}
struct iterations {
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index fc3888664b20..c143c3bc1ef8 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -542,6 +542,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
}
+struct srcline_node {
+ u64 addr;
+ char *srcline;
+ struct rb_node rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+ struct rb_node **p = &tree->rb_node;
+ struct rb_node *parent = NULL;
+ struct srcline_node *i, *node;
+
+ node = zalloc(sizeof(struct srcline_node));
+ if (!node) {
+ perror("not enough memory for the srcline node");
+ return;
+ }
+
+ node->addr = addr;
+ node->srcline = srcline;
+
+ while (*p != NULL) {
+ parent = *p;
+ i = rb_entry(parent, struct srcline_node, rb_node);
+ if (addr < i->addr)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ rb_link_node(&node->rb_node, parent, p);
+ rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+ struct rb_node *n = tree->rb_node;
+
+ while (n) {
+ struct srcline_node *i = rb_entry(n, struct srcline_node,
+ rb_node);
+
+ if (addr < i->addr)
+ n = n->rb_left;
+ else if (addr > i->addr)
+ n = n->rb_right;
+ else
+ return i->srcline;
+ }
+
+ return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+ struct srcline_node *pos;
+ struct rb_node *next = rb_first(tree);
+
+ while (next) {
+ pos = rb_entry(next, struct srcline_node, rb_node);
+ next = rb_next(&pos->rb_node);
+ rb_erase(&pos->rb_node, tree);
+ free_srcline(pos->srcline);
+ zfree(&pos);
+ }
+}
+
struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
struct symbol *sym)
{
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index ebe38cd22294..1c4d6210860b 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
bool show_sym, bool show_addr, bool unwind_inlines);
void free_srcline(char *srcline);
+/* insert the srcline into the DSO, which will take ownership */
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+/* find previously inserted srcline */
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+/* delete all srclines within the tree */
+void srcline__tree_delete(struct rb_root *tree);
+
#define SRCLINE_UNKNOWN ((char *) "??:0")
struct inline_list {
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 14/15] perf report: Use srcline from callchain for hist entries
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (12 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 13/15] perf report: Cache srclines for callchain nodes Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 15/15] perf util: Enable handling of inlined frames by default Arnaldo Carvalho de Melo
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Jin Yao, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
This also removes the symbol name from the srcline column, more on this
below.
This ensures we use the correct srcline, which could originate from a
potentially inlined function. The hist entries used to query for the
srcline based purely on the IP, which leads to wrong results for inlined
entries.
Before:
~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children Self Source:Line
# ........ ........ ..................................................................................................................................
#
94.23% 0.00% __libc_start_main+18446603487898210537
94.23% 0.00% _start+41
44.58% 0.00% main+100
44.58% 0.00% std::_Norm_helper<true>::_S_do_it<double>+100
44.58% 0.00% std::__complex_abs+100
44.58% 0.00% std::abs<double>+100
44.58% 0.00% std::norm<double>+100
36.01% 0.00% hypot+18446603487892193300
25.81% 0.00% main+41
25.81% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
25.81% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
25.75% 25.75% random.h:143
18.39% 0.00% main+57
18.39% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
18.39% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
13.80% 13.80% random.tcc:3330
5.64% 0.00% ??:0
4.13% 4.13% __hypot_finite+163
4.13% 0.00% __hypot_finite+18446603487892193443
...
~~~~~
After:
~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children Self Source:Line
# ........ ........ ...........................................
#
94.30% 1.19% main.cpp:39
94.23% 0.00% __libc_start_main+18446603487898210537
94.23% 0.00% _start+41
48.44% 1.70% random.h:1823
48.44% 0.00% random.h:1814
46.74% 2.53% random.h:185
44.68% 0.10% complex:589
44.68% 0.00% complex:597
44.68% 0.00% complex:654
44.68% 0.00% complex:664
40.61% 13.80% random.tcc:3330
36.01% 0.00% hypot+18446603487892193300
26.81% 0.00% random.h:151
26.81% 0.00% random.h:332
25.75% 25.75% random.h:143
5.64% 0.00% ??:0
4.13% 4.13% __hypot_finite+163
4.13% 0.00% __hypot_finite+18446603487892193443
...
~~~~~
Note that this change removes the symbol from the source:line hist
column. If this information is desired, users should explicitly query
for it if needed. I.e. run this command instead:
~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children Self Symbol Source:Line
# ........ ........ ................................................................................................................................... ...........................................
#
94.30% 1.19% [.] main main.cpp:39
94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537
94.23% 0.00% [.] _start _start+41
48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1814
48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1823
46.74% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) random.h:185
44.68% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) complex:654
44.68% 0.00% [.] std::__complex_abs (inlined) complex:589
44.68% 0.00% [.] std::abs<double> (inlined) complex:597
44.68% 0.00% [.] std::norm<double> (inlined) complex:664
39.80% 13.59% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330
36.01% 0.00% [.] hypot hypot+18446603487892193300
26.81% 0.00% [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) random.h:151
26.81% 0.00% [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) random.h:332
25.75% 0.00% [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) random.h:143
25.19% 25.19% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143
4.13% 4.13% [.] __hypot_finite __hypot_finite+163
4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443
...
~~~~~
Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:
~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children Self Symbol Source:Line
# ........ ........ ................................................................................................................................... ..................................................................................................................................
#
94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537
94.23% 0.00% [.] _start _start+41
44.58% 0.00% [.] main main+100
44.58% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) std::_Norm_helper<true>::_S_do_it<double>+100
44.58% 0.00% [.] std::__complex_abs (inlined) std::__complex_abs+100
44.58% 0.00% [.] std::abs<double> (inlined) std::abs<double>+100
44.58% 0.00% [.] std::norm<double> (inlined) std::norm<double>+100
36.01% 0.00% [.] hypot hypot+18446603487892193300
25.81% 0.00% [.] main main+41
25.81% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
25.81% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
25.69% 25.69% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143
18.39% 0.00% [.] main main+57
18.39% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
18.39% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
13.80% 13.80% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330
4.13% 4.13% [.] __hypot_finite __hypot_finite+163
4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443
...
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 1 +
tools/perf/util/event.c | 1 +
tools/perf/util/hist.c | 2 ++
tools/perf/util/symbol.h | 1 +
4 files changed, 5 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 19bfcadcf891..3a3916934a92 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1090,6 +1090,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
{
al->map = node->map;
al->sym = node->sym;
+ al->srcline = node->srcline;
if (node->map)
al->addr = node->map->map_ip(node->map, node->ip);
else
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 47eff4767edb..3c411e7e36aa 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1604,6 +1604,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
al->sym = NULL;
al->cpu = sample->cpu;
al->socket = -1;
+ al->srcline = NULL;
if (al->cpu >= 0) {
struct perf_env *env = machine->env;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b0fa9c217e1c..25d143053ab5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -596,6 +596,7 @@ __hists__add_entry(struct hists *hists,
.map = al->map,
.sym = al->sym,
},
+ .srcline = al->srcline ? strdup(al->srcline) : NULL,
.socket = al->socket,
.cpu = al->cpu,
.cpumode = al->cpumode,
@@ -950,6 +951,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
.map = al->map,
.sym = al->sym,
},
+ .srcline = al->srcline ? strdup(al->srcline) : NULL,
.parent = iter->parent,
.raw_data = sample->raw_data,
.raw_size = sample->raw_size,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d880a059babb..d548ea5cb418 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -209,6 +209,7 @@ struct addr_location {
struct thread *thread;
struct map *map;
struct symbol *sym;
+ const char *srcline;
u64 addr;
char level;
u8 filtered;
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 15/15] perf util: Enable handling of inlined frames by default
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (13 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 14/15] perf report: Use srcline from callchain for hist entries Arnaldo Carvalho de Melo
@ 2017-10-25 16:00 ` Arnaldo Carvalho de Melo
2017-10-25 17:10 ` [GIT PULL 00/15] perf/core inlining improvements Ingo Molnar
2017-10-26 9:03 ` Milian Wolff
16 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-25 16:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, Milian Wolff, David Ahern,
Jin Yao, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
From: Milian Wolff <milian.wolff@kdab.com>
Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable this
useful option by default.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-report.txt | 3 ++-
tools/perf/Documentation/perf-script.txt | 3 ++-
tools/perf/util/symbol.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 383a98d992ed..ddde2b54af57 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -434,7 +434,8 @@ include::itrace.txt[]
--inline::
If a callgraph address belongs to an inlined function, the inline stack
- will be printed. Each entry is function name or file/line.
+ will be printed. Each entry is function name or file/line. Enabled by
+ default, disable with --no-inline.
include::callchain-overhead-calculation.txt[]
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index bcc1ba35a2d8..25e677344728 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]
--inline::
If a callgraph address belongs to an inlined function, the inline stack
- will be printed. Each entry has function name and file/line.
+ will be printed. Each entry has function name and file/line. Enabled by
+ default, disable with --no-inline.
SEE ALSO
--------
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 066e38aa4063..ce6993bebf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
.show_hist_headers = true,
.symfs = "",
.event_group = true,
+ .inline_name = true,
};
static enum dso_binary_type binary_type_symtab[] = {
--
2.13.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [GIT PULL 00/15] perf/core inlining improvements
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (14 preceding siblings ...)
2017-10-25 16:00 ` [PATCH 15/15] perf util: Enable handling of inlined frames by default Arnaldo Carvalho de Melo
@ 2017-10-25 17:10 ` Ingo Molnar
2017-10-26 9:03 ` Milian Wolff
16 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2017-10-25 17:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: linux-kernel, linux-perf-users, Andi Kleen, David Ahern, Jin Yao,
Jiri Olsa, Milian Wolff, Namhyung Kim, Peter Zijlstra,
Ravi Bangoria, Arnaldo Carvalho de Melo
* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> Hi Ingo,
>
> Please consider pulling, this is Milian's v7 plus some fixes
> acked by Namhyung after some discussion among the three of us, I
> probably need to pick some more patches that are related to this area,
> but lets make some progress and merge this kit,
>
> - Arnaldo
>
> Test results at the end of this message, as usual.
>
> The following changes since commit 9b7c85473cc2fa6fc4a7f87636ff2b69742b82b7:
>
> Merge tag 'perf-core-for-mingo-4.15-20171023' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2017-10-24 10:53:04 +0200)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.15-20171025
>
> for you to fetch changes up to d8a88dd243a170a226aba33e7c53704db2f82aa6:
>
> perf util: Enable handling of inlined frames by default (2017-10-25 10:50:47 -0300)
>
> ----------------------------------------------------------------
> perf/core inline improvements:
>
> From Milian's cover letter: (Milian Wolff)
>
> This series of patches completely reworks the way inline frames are
> handled. Instead of querying for the inline nodes on-demand in the
> individual tools, we now create proper callchain nodes for inlined
> frames. The advantages this approach brings are numerous:
>
> - Less duplicated code in the individual browser
>
> - Aggregated cost for inlined frames for the --children top-down list
>
> - Various bug fixes that arose from querying for a srcline/symbol based on
> the IP of a sample, which will always point to the last inlined frame
> instead of the corresponding non-inlined frame
>
> - Overall much better support for visualizing cost for heavily-inlined C++
> code, which simply was confusing and unreliable before
>
> - srcline honors the global setting as to whether full paths or basenames
> should be shown
>
> - Caches for inlined frames and srcline information, which allow us to
> enable inline frame handling by default
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> ----------------------------------------------------------------
> Milian Wolff (15):
> perf report: Remove code to handle inline frames from browsers
> perf callchain: Store srcline in callchain_cursor_node
> perf callchain: Refactor inline_list to operate on symbols
> perf callchain: Refactor inline_list to store srcline string directly
> perf callchain: Create real callchain entries for inlined frames
> perf report: Fall-back to function name comparison for -g srcline
> perf callchain: Mark inlined frames in output by " (inlined)" suffix
> perf script: Mark inlined frames and do not print DSO for them
> perf callchain: Compare symbol name for inlined frames when matching
> perf report: Compare symbol name for inlined frames when sorting
> perf report: Properly handle branch count in match_chain()
> perf report: Cache failed lookups of inlined frames
> perf report: Cache srclines for callchain nodes
> perf report: Use srcline from callchain for hist entries
> perf util: Enable handling of inlined frames by default
>
> tools/perf/Documentation/perf-report.txt | 3 +-
> tools/perf/Documentation/perf-script.txt | 3 +-
> tools/perf/ui/browsers/hists.c | 180 ++-------------------
> tools/perf/ui/stdio/hist.c | 77 +--------
> tools/perf/util/callchain.c | 174 +++++++++++---------
> tools/perf/util/callchain.h | 6 +-
> tools/perf/util/dso.c | 7 +
> tools/perf/util/dso.h | 2 +
> tools/perf/util/event.c | 1 +
> tools/perf/util/evsel_fprintf.c | 37 +----
> tools/perf/util/hist.c | 7 +-
> tools/perf/util/machine.c | 65 +++++++-
> tools/perf/util/sort.c | 6 +
> tools/perf/util/sort.h | 1 -
> tools/perf/util/srcline.c | 268 +++++++++++++++++++++++++------
> tools/perf/util/srcline.h | 26 ++-
> tools/perf/util/symbol.c | 1 +
> tools/perf/util/symbol.h | 2 +
> 18 files changed, 443 insertions(+), 423 deletions(-)
Pulled, thanks a lot Arnaldo!
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL 00/15] perf/core inlining improvements
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
` (15 preceding siblings ...)
2017-10-25 17:10 ` [GIT PULL 00/15] perf/core inlining improvements Ingo Molnar
@ 2017-10-26 9:03 ` Milian Wolff
16 siblings, 0 replies; 18+ messages in thread
From: Milian Wolff @ 2017-10-26 9:03 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, linux-kernel, linux-perf-users, Andi Kleen,
David Ahern, Jin Yao, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
Ravi Bangoria, Arnaldo Carvalho de Melo
On Mittwoch, 25. Oktober 2017 17:59:58 CEST Arnaldo Carvalho de Melo wrote:
> Hi Ingo,
>
> Please consider pulling, this is Milian's v7 plus some fixes
> acked by Namhyung after some discussion among the three of us, I
> probably need to pick some more patches that are related to this area,
> but lets make some progress and merge this kit,
Thanks a lot for everyone involved in reviewing this series. Much appreciated!
Cheers
--
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-10-26 9:03 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
2017-10-25 15:59 ` [PATCH 01/15] perf report: Remove code to handle inline frames from browsers Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 02/15] perf callchain: Store srcline in callchain_cursor_node Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 03/15] perf callchain: Refactor inline_list to operate on symbols Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 04/15] perf callchain: Refactor inline_list to store srcline string directly Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 05/15] perf callchain: Create real callchain entries for inlined frames Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 06/15] perf report: Fall-back to function name comparison for -g srcline Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 07/15] perf callchain: Mark inlined frames in output by " (inlined)" suffix Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 08/15] perf script: Mark inlined frames and do not print DSO for them Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 09/15] perf callchain: Compare symbol name for inlined frames when matching Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 10/15] perf report: Compare symbol name for inlined frames when sorting Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 11/15] perf report: Properly handle branch count in match_chain() Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 12/15] perf report: Cache failed lookups of inlined frames Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 13/15] perf report: Cache srclines for callchain nodes Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 14/15] perf report: Use srcline from callchain for hist entries Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 15/15] perf util: Enable handling of inlined frames by default Arnaldo Carvalho de Melo
2017-10-25 17:10 ` [GIT PULL 00/15] perf/core inlining improvements Ingo Molnar
2017-10-26 9:03 ` Milian Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).