From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Waiman Long <Waiman.Long@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org,
Scott J Norton <scott.norton@hp.com>,
Douglas Hatch <doug.hatch@hp.com>,
Don Zickus <dzickus@redhat.com>, Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>
Subject: Re: [PATCH v4 2/2] perf tool: improves DSO long names lookup speed with rbtree
Date: Fri, 26 Sep 2014 11:22:34 -0300 [thread overview]
Message-ID: <20140926142234.GC3879@kernel.org> (raw)
In-Reply-To: <1411573540-8765-3-git-send-email-Waiman.Long@hp.com>
Em Wed, Sep 24, 2014 at 11:45:40AM -0400, Waiman Long escreveu:
> With workload that spawns and destroys many threads and processes,
> it was found that perf-mem could took a long time to post-process
> the perf data after the target workload had completed its operation.
> The performance bottleneck was found to be the lookup and insertion
> of the new DSO structures (thousands of them in this case).
>
> In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread),
> the perf profile below shows what perf was doing after the profiled
> AIM7 shared workload completed:
>
> - 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
> - __strcmp_sse42
> - 99.82% map__new
> machine__process_mmap_event
> perf_session_deliver_event
> perf_session__process_event
> __perf_session__process_events
> cmd_record
> cmd_mem
> run_builtin
> main
> __libc_start_main
> - 13.17% perf perf [.] __dsos__findnew
> __dsos__findnew
> map__new
> machine__process_mmap_event
> perf_session_deliver_event
> perf_session__process_event
> __perf_session__process_events
> cmd_record
> cmd_mem
> run_builtin
> main
> __libc_start_main
>
> So about 97% of CPU times were spent in the map__new() function
> trying to insert new DSO entry into the DSO linked list. The whole
> post-processing step took about 9 minutes.
>
> The DSO structures are currently searched linearly. So the total
> processing time will be proportional to n^2.
>
> To overcome this performance problem, the DSO code is modified to
> also put the DSO structures in a RB tree sorted by its long name
> in additional to being in a simple linked list. With this change,
> the processing time will become proportional to n*log(n) which will
> be much quicker for large n. However, the short name will still be
> searched using the old linear searching method. With that patch
> in place, the same perf-mem post-processing step took less than 30
> seconds to complete.
>
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
> tools/perf/util/dso.c | 73 ++++++++++++++++++++++++++++++++++++++++++--
> tools/perf/util/dso.h | 1 +
> tools/perf/util/machine.h | 4 ++-
> 3 files changed, 73 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index c2c6134..810a4b5 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -651,6 +651,67 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
> return dso;
> }
>
> +/*
> + * Find a matching entry and/or link current entry to RB tree.
> + * Either one of the dso or name parameter must be non-NULL or the
> + * function will not work.
> + */
> +static struct dso *dso__findlink_by_longname(struct rb_root *root,
> + struct dso *dso, const char *name)
> +{
> + struct rb_node **p = &root->rb_node;
> + struct rb_node *parent = NULL;
> + int warned = false;
> +
> + if (!name)
> + name = dso->long_name;
> + /*
> + * Find node with the matching name
> + */
> + while (*p) {
> + struct dso *this = rb_entry(*p, struct dso, rb_node);
> + long rc = (long)strcmp(name, this->long_name);
> +
> + parent = *p;
> + if (rc == 0) {
> + /*
> + * In case the new DSO is a duplicate of an existing
> + * one, print an one-time warning & sort the entry
> + * by its DSO address.
> + */
> + if (!dso || (dso == this))
> + return this; /* Find matching dso */
> + /*
> + * The kernel DSOs may have duplicated long name,
> + * so don't print warning for them.
> + */
> + if (!warned && !strstr(name, "kernel.kallsyms")
> + && !strstr(name, "/vmlinux")) {
> + pr_warning("Duplicated dso long name: %s\n",
> + name);
Huh? Can you elaborate on this? Ho can we add multiple DSOs with the
exact same name into this tree? Have you actually seen this in practice?
I guess so, judging by the comment above ("may have").
I'll try the patch to see if I get these warnings...
But initial reaction to these long casts and fallbacking to pointer
arithmetic for tree searching/inserting looked ugly :-\
- Arnaldo
> + warned = true;
> + }
> + rc = (long)dso - (long)this;
> + }
> + if (rc < 0)
> + p = &parent->rb_left;
> + else
> + p = &parent->rb_right;
> + }
> + if (dso) {
> + /* Add new node and rebalance tree */
> + rb_link_node(&dso->rb_node, parent, p);
> + rb_insert_color(&dso->rb_node, root);
> + }
> + return NULL;
> +}
> +
> +static inline struct dso *
> +dso__find_by_longname(struct rb_root *root, const char *name)
> +{
> + return dso__findlink_by_longname(root, NULL, name);
> +}
> +
> void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
> {
> if (name == NULL)
> @@ -754,6 +815,7 @@ struct dso *dso__new(const char *name)
> dso->kernel = DSO_TYPE_USER;
> dso->needs_swap = DSO_SWAP__UNSET;
> dso->dsos = NULL;
> + RB_CLEAR_NODE(&dso->rb_node);
> INIT_LIST_HEAD(&dso->node);
> INIT_LIST_HEAD(&dso->data.open_entry);
> }
> @@ -776,6 +838,11 @@ void dso__delete(struct dso *dso)
> zfree((char **)&dso->long_name);
> dso->long_name_allocated = false;
> }
> + if (dso->dsos) {
> + /* Remove entry from rbtree */
> + rb_erase(&dso->rb_node, &dso->dsos->root);
> + dso->dsos = NULL;
> + }
>
> dso__data_close(dso);
> dso_cache__free(&dso->data.cache);
> @@ -854,6 +921,7 @@ void dsos__add(struct dsos *dsos, struct dso *dso)
> {
> dso->dsos = dsos;
> list_add_tail(&dso->node, &dsos->head);
> + dso__findlink_by_longname(&dsos->root, dso, NULL);
> }
>
> struct dso *dsos__find(const struct dsos *dsos, const char *name,
> @@ -867,10 +935,7 @@ struct dso *dsos__find(const struct dsos *dsos, const char *name,
> return pos;
> return NULL;
> }
> - list_for_each_entry(pos, &dsos->head, node)
> - if (strcmp(pos->long_name, name) == 0)
> - return pos;
> - return NULL;
> + return dso__find_by_longname((struct rb_root *)&dsos->root, name);
> }
>
> struct dso *__dsos__findnew(struct dsos *dsos, const char *name)
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index cadfef7..bae7042 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -94,6 +94,7 @@ struct dsos;
>
> struct dso {
> struct list_head node;
> + struct rb_node rb_node; /* rbtree node sorted by long name */
> struct dsos *dsos;
> struct rb_root symbols[MAP__NR_TYPES];
> struct rb_root symbol_names[MAP__NR_TYPES];
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index d8abb6c..f7546f3 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -23,10 +23,12 @@ extern const char *ref_reloc_sym_names[];
> struct vdso_info;
>
> /*
> - * DSOs are put into a list for fast iteration.
> + * DSOs are put into both a list for fast iteration and rbtree for fast
> + * long name lookup.
> */
> struct dsos {
> struct list_head head;
> + struct rb_root root; /* rbtree root sorted by long name */
> };
>
> struct machine {
> --
> 1.7.1
next prev parent reply other threads:[~2014-09-26 14:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-24 15:45 [PATCH v4 0/2] perf tool: improves DSO long names search speed with rbtree Waiman Long
2014-09-24 15:45 ` [PATCH v4 1/2] perf tool: encapsulate dsos list head into struct dsos Waiman Long
2014-09-26 14:06 ` Arnaldo Carvalho de Melo
2014-09-29 3:54 ` Namhyung Kim
2014-09-29 17:30 ` Waiman Long
2014-09-29 18:09 ` Arnaldo Carvalho de Melo
2014-09-29 18:40 ` Waiman Long
2014-09-26 14:12 ` Arnaldo Carvalho de Melo
2014-09-29 17:26 ` Waiman Long
2014-09-29 18:07 ` Arnaldo Carvalho de Melo
2014-09-24 15:45 ` [PATCH v4 2/2] perf tool: improves DSO long names lookup speed with rbtree Waiman Long
2014-09-26 14:22 ` Arnaldo Carvalho de Melo [this message]
2014-09-29 4:02 ` Namhyung Kim
2014-09-29 17:25 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140926142234.GC3879@kernel.org \
--to=acme@kernel.org \
--cc=Waiman.Long@hp.com \
--cc=adrian.hunter@intel.com \
--cc=doug.hatch@hp.com \
--cc=dzickus@redhat.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=scott.norton@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.