From: Namhyung Kim <namhyung@kernel.org>
To: Milian Wolff <milian.wolff@kdab.com>
Cc: acme@kernel.org, Jin Yao <yao.jin@linux.intel.com>,
Linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Arnaldo Carvalho de Melo <acme@redhat.com>,
David Ahern <dsahern@gmail.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
kernel-team@lge.com
Subject: Re: [PATCH v2 11/14] perf report: cache srclines for callchain nodes
Date: Thu, 10 Aug 2017 11:13:25 +0900 [thread overview]
Message-ID: <20170810021325.GA1797@sejong> (raw)
In-Reply-To: <20170806212446.24925-12-milian.wolff@kdab.com>
Hi Milian,
On Sun, Aug 06, 2017 at 11:24:43PM +0200, Milian Wolff wrote:
> On one hand this ensures that the memory is properly freed when
> the DSO gets freed. On the other hand this significantly speeds up
> the processing of the callchain nodes when lots of srclines are
> requested. For one of my data files e.g.:
>
> Before:
>
> Performance counter stats for 'perf report -s srcline -g srcline --stdio':
>
> 52496.495043 task-clock (msec) # 0.999 CPUs utilized
> 634 context-switches # 0.012 K/sec
> 2 cpu-migrations # 0.000 K/sec
> 191,561 page-faults # 0.004 M/sec
> 165,074,498,235 cycles # 3.144 GHz
> 334,170,832,408 instructions # 2.02 insn per cycle
> 90,220,029,745 branches # 1718.591 M/sec
> 654,525,177 branch-misses # 0.73% of all branches
>
> 52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
>
> After:
>
> Performance counter stats for 'perf report -s srcline -g srcline --stdio':
>
> 22606.323706 task-clock (msec) # 1.000 CPUs utilized
> 31 context-switches # 0.001 K/sec
> 0 cpu-migrations # 0.000 K/sec
> 185,471 page-faults # 0.008 M/sec
> 71,188,113,681 cycles # 3.149 GHz
> 133,204,943,083 instructions # 1.87 insn per cycle
> 34,886,384,979 branches # 1543.214 M/sec
> 278,214,495 branch-misses # 0.80% of all branches
>
> 22.609857253 seconds time elapsed
>
> Note that the difference is only this large when `--inline` is not
> passed. In such situations, we would use the inliner cache and
> thus do not run this code path that often.
>
> I think that this cache should actually be used in other places, too.
> When looking at the valgrind leak report for perf report, we see tons
> of srclines being leaked, most notably from calls to
> hist_entry__get_srcline. The problem is that get_srcline has many
> different formatting options (show_sym, show_addr, potentially even
> unwind_inlines when calling __get_srcline directly). As such, the
> srcline cannot easily be cached for all calls, or we'd have to add
> caches for all formatting combinations (6 so far). An alternative
> would be to remove the formatting options and handle that on a
> different level - i.e. print the sym/addr on demand wherever we
> actually output something. And the unwind_inlines could be moved into
> a separate function that does not return the srcline.
Agreed. Also I guess no need to unwind anymore to get a srcfile for
an entry with your change.
Thanks,
Namhyung
>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Yao Jin <yao.jin@linux.intel.com>
> Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
> ---
> tools/perf/util/dso.c | 1 +
> tools/perf/util/dso.h | 1 +
> tools/perf/util/machine.c | 17 +++++++++---
> tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/srcline.h | 7 +++++
> 5 files changed, 89 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index 72e6e390fd26..8c7f2862cff2 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -1235,6 +1235,7 @@ void dso__delete(struct dso *dso)
> for (i = 0; i < MAP__NR_TYPES; ++i)
> symbols__delete(&dso->symbols[i]);
> inlines__tree_delete(&dso->inlined_nodes);
> + srcline__tree_delete(&dso->srclines);
>
> if (dso->short_name_allocated) {
> zfree((char **)&dso->short_name);
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index 7d1e2b3c1f10..ac3a65a30ff2 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -142,6 +142,7 @@ struct dso {
> struct rb_root symbols[MAP__NR_TYPES];
> struct rb_root symbol_names[MAP__NR_TYPES];
> struct rb_root inlined_nodes;
> + struct rb_root srclines;
> struct {
> u64 addr;
> struct symbol *symbol;
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 479c42450d6a..c5ee6ba2b9ae 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1675,11 +1675,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
>
> static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
> {
> + char *srcline = NULL;
> +
> if (!map || callchain_param.key == CCKEY_FUNCTION)
> - return NULL;
> + return srcline;
> +
> + srcline = srcline__tree_find(&map->dso->srclines, ip);
> + if (!srcline) {
> + bool show_sym = false;
> + bool show_addr = callchain_param.key == CCKEY_ADDRESS;
> +
> + srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
> + sym, show_sym, show_addr);
> + srcline__tree_insert(&map->dso->srclines, ip, srcline);
> + }
>
> - return get_srcline(map->dso, map__rip_2objdump(map, ip),
> - sym, false, callchain_param.key == CCKEY_ADDRESS);
> + return srcline;
> }
>
> static int add_callchain_ip(struct thread *thread,
> diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> index 01b4d5ee51fd..0c5ee741c515 100644
> --- a/tools/perf/util/srcline.c
> +++ b/tools/perf/util/srcline.c
> @@ -531,6 +531,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
> return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
> }
>
> +struct srcline_node {
> + u64 addr;
> + char *srcline;
> + struct rb_node rb_node;
> +};
> +
> +void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
> +{
> + struct rb_node **p = &tree->rb_node;
> + struct rb_node *parent = NULL;
> + struct srcline_node *i, *node;
> +
> + node = zalloc(sizeof(struct srcline_node));
> + if (!node) {
> + perror("not enough memory for the srcline node");
> + return;
> + }
> +
> + node->addr = addr;
> + node->srcline = srcline;
> +
> + while (*p != NULL) {
> + parent = *p;
> + i = rb_entry(parent, struct srcline_node, rb_node);
> + if (addr < i->addr)
> + p = &(*p)->rb_left;
> + else
> + p = &(*p)->rb_right;
> + }
> + rb_link_node(&node->rb_node, parent, p);
> + rb_insert_color(&node->rb_node, tree);
> +}
> +
> +char *srcline__tree_find(struct rb_root *tree, u64 addr)
> +{
> + struct rb_node *n = tree->rb_node;
> +
> + while (n) {
> + struct srcline_node *i = rb_entry(n, struct srcline_node,
> + rb_node);
> +
> + if (addr < i->addr)
> + n = n->rb_left;
> + else if (addr > i->addr)
> + n = n->rb_right;
> + else
> + return i->srcline;
> + }
> +
> + return NULL;
> +}
> +
> +void srcline__tree_delete(struct rb_root *tree)
> +{
> + struct srcline_node *pos;
> + struct rb_node *next = rb_first(tree);
> +
> + while (next) {
> + pos = rb_entry(next, struct srcline_node, rb_node);
> + next = rb_next(&pos->rb_node);
> + rb_erase(&pos->rb_node, tree);
> + free_srcline(pos->srcline);
> + zfree(&pos);
> + }
> +}
> +
> struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
> struct symbol *sym)
> {
> diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
> index 0d2aca92e8c7..187a71082cb8 100644
> --- a/tools/perf/util/srcline.h
> +++ b/tools/perf/util/srcline.h
> @@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
> bool show_sym, bool show_addr, bool unwind_inlines);
> void free_srcline(char *srcline);
>
> +// insert the srcline into the DSO, which will take ownership
> +void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
> +// find previously inserted srcline
> +char *srcline__tree_find(struct rb_root *tree, u64 addr);
> +// delete all srclines within the tree
> +void srcline__tree_delete(struct rb_root *tree);
> +
> #define SRCLINE_UNKNOWN ((char *) "??:0")
>
> struct inline_list {
> --
> 2.13.3
>
next prev parent reply other threads:[~2017-08-10 2:13 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-06 21:24 [PATCH v2 00/14] generate full callchain cursor entries for inlined frames Milian Wolff
2017-08-06 21:24 ` [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers Milian Wolff
2017-08-07 15:07 ` Arnaldo Carvalho de Melo
2017-08-07 19:22 ` Milian Wolff
2017-08-07 20:16 ` Arnaldo Carvalho de Melo
2017-08-06 21:24 ` [PATCH v2 02/14] perf util: take elf_name as const string in dso__demangle_sym Milian Wolff
2017-08-07 15:10 ` Arnaldo Carvalho de Melo
2017-08-14 17:48 ` [tip:perf/core] perf util: Take " tip-bot for Milian Wolff
2017-08-06 21:24 ` [PATCH v2 03/14] perf report: create real callchain entries for inlined frames Milian Wolff
2017-08-16 7:53 ` Namhyung Kim
2017-08-20 20:57 ` Milian Wolff
2017-08-28 12:18 ` Namhyung Kim
2017-09-06 13:13 ` Milian Wolff
2017-09-07 14:58 ` Namhyung Kim
2017-09-07 15:05 ` Milian Wolff
2017-09-07 15:16 ` Namhyung Kim
2017-08-06 21:24 ` [PATCH v2 04/14] perf report: fall-back to function name comparison for -g srcline Milian Wolff
2017-08-06 21:24 ` [PATCH v2 05/14] perf report: mark inlined frames in output by " (inlined)" suffix Milian Wolff
2017-08-06 21:24 ` [PATCH v2 06/14] perf script: mark inlined frames and do not print DSO for them Milian Wolff
2017-08-06 21:24 ` [PATCH v2 07/14] perf report: compare symbol name for inlined frames when matching Milian Wolff
2017-08-06 21:24 ` [PATCH v2 08/14] perf report: compare symbol name for inlined frames when sorting Milian Wolff
2017-08-06 21:24 ` [PATCH v2 09/14] perf report: properly handle branch count in match_chain Milian Wolff
2017-08-06 21:24 ` [PATCH v2 10/14] perf report: cache failed lookups of inlined frames Milian Wolff
2017-08-06 21:24 ` [PATCH v2 11/14] perf report: cache srclines for callchain nodes Milian Wolff
2017-08-10 2:13 ` Namhyung Kim [this message]
2017-08-10 11:51 ` Milian Wolff
2017-08-10 14:56 ` Namhyung Kim
2017-08-10 17:58 ` Arnaldo Carvalho de Melo
2017-08-11 11:28 ` Milian Wolff
2017-08-06 21:24 ` [PATCH v2 12/14] perf report: use srcline from callchain for hist entries Milian Wolff
2017-08-06 21:24 ` [PATCH v2 13/14] perf util: do not consider empty files as valid srclines Milian Wolff
2017-08-07 15:21 ` Arnaldo Carvalho de Melo
2017-08-14 17:48 ` [tip:perf/core] perf srcline: Do " tip-bot for Milian Wolff
2017-08-06 21:24 ` [PATCH v2 14/14] perf util: enable handling of inlined frames by default Milian Wolff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170810021325.GA1797@sejong \
--to=namhyung@kernel.org \
--cc=Linux-kernel@vger.kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=acme@redhat.com \
--cc=dsahern@gmail.com \
--cc=kernel-team@lge.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=milian.wolff@kdab.com \
--cc=yao.jin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.