From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Milian Wolff <milian.wolff@kdab.com>,
David Ahern <dsahern@gmail.com>,
Jin Yao <yao.jin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 13/15] perf report: Cache srclines for callchain nodes
Date: Wed, 25 Oct 2017 13:00:11 -0300 [thread overview]
Message-ID: <20171025160013.11136-14-acme@kernel.org> (raw)
In-Reply-To: <20171025160013.11136-1-acme@kernel.org>
From: Milian Wolff <milian.wolff@kdab.com>
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/dso.c | 2 ++
tools/perf/util/dso.h | 1 +
tools/perf/util/machine.c | 17 +++++++++---
tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/srcline.h | 7 +++++
5 files changed, 90 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 75c8250b3b8a..3192b608e91b 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1203,6 +1203,7 @@ struct dso *dso__new(const char *name)
dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
dso->data.cache = RB_ROOT;
dso->inlined_nodes = RB_ROOT;
+ dso->srclines = RB_ROOT;
dso->data.fd = -1;
dso->data.status = DSO_DATA_STATUS_UNKNOWN;
dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
@@ -1237,6 +1238,7 @@ void dso__delete(struct dso *dso)
/* free inlines first, as they reference symbols */
inlines__tree_delete(&dso->inlined_nodes);
+ srcline__tree_delete(&dso->srclines);
for (i = 0; i < MAP__NR_TYPES; ++i)
symbols__delete(&dso->symbols[i]);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 122eca0d242d..821b16c67030 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
struct rb_root inlined_nodes;
+ struct rb_root srclines;
struct {
u64 addr;
struct symbol *symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 177c1d4088f8..94d8f1ccedd9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1711,11 +1711,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
{
+ char *srcline = NULL;
+
if (!map || callchain_param.key == CCKEY_FUNCTION)
- return NULL;
+ return srcline;
+
+ srcline = srcline__tree_find(&map->dso->srclines, ip);
+ if (!srcline) {
+ bool show_sym = false;
+ bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+ srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+ sym, show_sym, show_addr);
+ srcline__tree_insert(&map->dso->srclines, ip, srcline);
+ }
- return get_srcline(map->dso, map__rip_2objdump(map, ip),
- sym, false, callchain_param.key == CCKEY_ADDRESS);
+ return srcline;
}
struct iterations {
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index fc3888664b20..c143c3bc1ef8 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -542,6 +542,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
}
+struct srcline_node {
+ u64 addr;
+ char *srcline;
+ struct rb_node rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+ struct rb_node **p = &tree->rb_node;
+ struct rb_node *parent = NULL;
+ struct srcline_node *i, *node;
+
+ node = zalloc(sizeof(struct srcline_node));
+ if (!node) {
+ perror("not enough memory for the srcline node");
+ return;
+ }
+
+ node->addr = addr;
+ node->srcline = srcline;
+
+ while (*p != NULL) {
+ parent = *p;
+ i = rb_entry(parent, struct srcline_node, rb_node);
+ if (addr < i->addr)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ rb_link_node(&node->rb_node, parent, p);
+ rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+ struct rb_node *n = tree->rb_node;
+
+ while (n) {
+ struct srcline_node *i = rb_entry(n, struct srcline_node,
+ rb_node);
+
+ if (addr < i->addr)
+ n = n->rb_left;
+ else if (addr > i->addr)
+ n = n->rb_right;
+ else
+ return i->srcline;
+ }
+
+ return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+ struct srcline_node *pos;
+ struct rb_node *next = rb_first(tree);
+
+ while (next) {
+ pos = rb_entry(next, struct srcline_node, rb_node);
+ next = rb_next(&pos->rb_node);
+ rb_erase(&pos->rb_node, tree);
+ free_srcline(pos->srcline);
+ zfree(&pos);
+ }
+}
+
struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
struct symbol *sym)
{
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index ebe38cd22294..1c4d6210860b 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
bool show_sym, bool show_addr, bool unwind_inlines);
void free_srcline(char *srcline);
+/* insert the srcline into the DSO, which will take ownership */
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+/* find previously inserted srcline */
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+/* delete all srclines within the tree */
+void srcline__tree_delete(struct rb_root *tree);
+
#define SRCLINE_UNKNOWN ((char *) "??:0")
struct inline_list {
--
2.13.6
next prev parent reply other threads:[~2017-10-25 16:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-25 15:59 [GIT PULL 00/15] perf/core inlining improvements Arnaldo Carvalho de Melo
2017-10-25 15:59 ` [PATCH 01/15] perf report: Remove code to handle inline frames from browsers Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 02/15] perf callchain: Store srcline in callchain_cursor_node Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 03/15] perf callchain: Refactor inline_list to operate on symbols Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 04/15] perf callchain: Refactor inline_list to store srcline string directly Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 05/15] perf callchain: Create real callchain entries for inlined frames Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 06/15] perf report: Fall-back to function name comparison for -g srcline Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 07/15] perf callchain: Mark inlined frames in output by " (inlined)" suffix Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 08/15] perf script: Mark inlined frames and do not print DSO for them Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 09/15] perf callchain: Compare symbol name for inlined frames when matching Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 10/15] perf report: Compare symbol name for inlined frames when sorting Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 11/15] perf report: Properly handle branch count in match_chain() Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 12/15] perf report: Cache failed lookups of inlined frames Arnaldo Carvalho de Melo
2017-10-25 16:00 ` Arnaldo Carvalho de Melo [this message]
2017-10-25 16:00 ` [PATCH 14/15] perf report: Use srcline from callchain for hist entries Arnaldo Carvalho de Melo
2017-10-25 16:00 ` [PATCH 15/15] perf util: Enable handling of inlined frames by default Arnaldo Carvalho de Melo
2017-10-25 17:10 ` [GIT PULL 00/15] perf/core inlining improvements Ingo Molnar
2017-10-26 9:03 ` Milian Wolff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171025160013.11136-14-acme@kernel.org \
--to=acme@kernel.org \
--cc=acme@redhat.com \
--cc=dsahern@gmail.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=milian.wolff@kdab.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=yao.jin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).