* [PATCH] perf mem: improves DSO long names search speed with RB tree
@ 2014-09-15 16:43 Waiman Long
2014-09-15 20:15 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 3+ messages in thread
From: Waiman Long @ 2014-09-15 16:43 UTC (permalink / raw)
To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
Arnaldo Carvalho de Melo
Cc: linux-kernel, Scott J Norton, Don Zickus, Jiri Olsa,
Adrian Hunter, Waiman Long
With workload that spawns and destroys many threads and processes,
it was found that perf-mem could took a long time to post-process
the perf data after the target workload had completed its operation.
The performance bottleneck was found to be searching and insertion
of the new DSO structures (thousands of them in this case).
In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread),
the perf profile below shows what perf was doing after the profiled
AIM7 shared workload completed:
- 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
- __strcmp_sse42
- 99.82% map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
- 13.17% perf perf [.] __dsos__findnew
__dsos__findnew
map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
So about 97% of CPU times were spent in the map__new() function
trying to insert new DSO entry into the DSO linked list. The whole
post-processing step took about 9 minutes.
The DSO structures are currently searched linearly. So the total
processing time will be proportional to n^2.
To overcome this performance problem, the DSO code is modified to
put the DSO structures in a RB tree sorted by its long name. With
this change, the processing time will become proportional to n*log(n)
which will be much quicker for large n. However, the short name will
still be searched using the old linear searching method which is slow.
With that patch in place, the same perf-mem post-processing step took
less than 30 seconds to complete.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
tools/perf/util/dso.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
tools/perf/util/dso.h | 1 +
2 files changed, 74 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 819f104..bd92564 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -611,17 +611,83 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
return dso;
}
+/*
+ * RB root of DSOs sorted by the long name
+ */
+static struct rb_root dso__long_name_root = { NULL };
+
+/*
+ * Either one of the dso or name parameter must be non-NULL or the
+ * function will not work.
+ */
+static struct dso *
+dso__long_name_findadd_node(struct dso *dso, const char *name)
+{
+ struct rb_node **p = &dso__long_name_root.rb_node;
+ struct rb_node *parent = NULL;
+ int warned = false;
+
+ if (!name)
+ name = dso->long_name;
+ /*
+ * Find node with the matching name
+ */
+ while (*p) {
+ struct dso *this = rb_entry(*p, struct dso, long_name_rb_node);
+ long rc = (long)strcmp(name, this->long_name);
+
+ parent = *p;
+ if (rc == 0) {
+ /*
+ * In case the new DSO is a duplicate of an existing
+ * one, print an one-time warning & sort the entry
+ * by its DSO address.
+ */
+ if (!dso || (dso == this))
+ return this; /* Find matching dso */
+ if (!warned) {
+ pr_warning("Duplicated dso long name: %s\n",
+ name);
+ warned = true;
+ }
+ rc = (long)dso - (long)this;
+ }
+ if (rc < 0)
+ p = &parent->rb_left;
+ else
+ p = &parent->rb_right;
+ }
+ if (dso) {
+ /* Add new node and rebalance tree */
+ rb_link_node(&dso->long_name_rb_node, parent, p);
+ rb_insert_color(&dso->long_name_rb_node, &dso__long_name_root);
+ }
+ return NULL;
+}
+
+static inline void dso__long_name_remove_node(struct dso *dso)
+{
+ rb_erase(&dso->long_name_rb_node, &dso__long_name_root);
+}
+
void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
{
if (name == NULL)
return;
+ if (dso->long_name) {
+ if (!strcmp(dso->long_name, name))
+ return;
+ dso__long_name_remove_node(dso);
+ }
+
if (dso->long_name_allocated)
free((char *)dso->long_name);
dso->long_name = name;
dso->long_name_len = strlen(name);
dso->long_name_allocated = name_allocated;
+ (void)dso__long_name_findadd_node(dso, name);
}
void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated)
@@ -695,6 +761,8 @@ struct dso *dso__new(const char *name)
if (dso != NULL) {
int i;
strcpy(dso->name, name);
+ RB_CLEAR_NODE(&dso->long_name_rb_node);
+ dso->long_name = NULL;
dso__set_long_name(dso, dso->name, false);
dso__set_short_name(dso, dso->name, false);
for (i = 0; i < MAP__NR_TYPES; ++i)
@@ -733,6 +801,10 @@ void dso__delete(struct dso *dso)
zfree((char **)&dso->long_name);
dso->long_name_allocated = false;
}
+ if (dso->long_name) {
+ dso__long_name_remove_node(dso);
+ dso->long_name = NULL;
+ }
dso__data_close(dso);
dso_cache__free(&dso->data.cache);
@@ -822,10 +894,7 @@ struct dso *dsos__find(const struct list_head *head, const char *name, bool cmp_
return pos;
return NULL;
}
- list_for_each_entry(pos, head, node)
- if (strcmp(pos->long_name, name) == 0)
- return pos;
- return NULL;
+ return dso__long_name_findadd_node(NULL, name);
}
struct dso *__dsos__findnew(struct list_head *head, const char *name)
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index ad553ba..ed949e4 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -76,6 +76,7 @@ struct dso {
struct list_head node;
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
+ struct rb_node long_name_rb_node;
void *a2l;
char *symsrc_filename;
unsigned int a2l_fails;
--
1.7.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] perf mem: improves DSO long names search speed with RB tree
2014-09-15 16:43 [PATCH] perf mem: improves DSO long names search speed with RB tree Waiman Long
@ 2014-09-15 20:15 ` Arnaldo Carvalho de Melo
2014-09-16 18:39 ` Waiman Long
0 siblings, 1 reply; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-15 20:15 UTC (permalink / raw)
To: Waiman Long
Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, linux-kernel,
Scott J Norton, Don Zickus, Jiri Olsa, Adrian Hunter
Em Mon, Sep 15, 2014 at 12:43:21PM -0400, Waiman Long escreveu:
> With workload that spawns and destroys many threads and processes,
> it was found that perf-mem could took a long time to post-process
> the perf data after the target workload had completed its operation.
> The performance bottleneck was found to be searching and insertion
> of the new DSO structures (thousands of them in this case).
>
> In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread),
> the perf profile below shows what perf was doing after the profiled
> AIM7 shared workload completed:
>
> - 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
> - __strcmp_sse42
> - 99.82% map__new
> machine__process_mmap_event
> perf_session_deliver_event
> perf_session__process_event
> __perf_session__process_events
> cmd_record
> cmd_mem
> run_builtin
> main
> __libc_start_main
> - 13.17% perf perf [.] __dsos__findnew
> __dsos__findnew
> map__new
> machine__process_mmap_event
> perf_session_deliver_event
> perf_session__process_event
> __perf_session__process_events
> cmd_record
> cmd_mem
> run_builtin
> main
> __libc_start_main
>
> So about 97% of CPU times were spent in the map__new() function
> trying to insert new DSO entry into the DSO linked list. The whole
> post-processing step took about 9 minutes.
>
> The DSO structures are currently searched linearly. So the total
> processing time will be proportional to n^2.
>
> To overcome this performance problem, the DSO code is modified to
> put the DSO structures in a RB tree sorted by its long name. With
> this change, the processing time will become proportional to n*log(n)
> which will be much quicker for large n. However, the short name will
> still be searched using the old linear searching method which is slow.
> With that patch in place, the same perf-mem post-processing step took
> less than 30 seconds to complete.
>
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
> tools/perf/util/dso.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
> tools/perf/util/dso.h | 1 +
> 2 files changed, 74 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index 819f104..bd92564 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -611,17 +611,83 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
> return dso;
> }
>
> +/*
> + * RB root of DSOs sorted by the long name
> + */
> +static struct rb_root dso__long_name_root = { NULL };
> +
> +/*
> + * Either one of the dso or name parameter must be non-NULL or the
> + * function will not work.
> + */
> +static struct dso *
> +dso__long_name_findadd_node(struct dso *dso, const char *name)
> +{
> + struct rb_node **p = &dso__long_name_root.rb_node;
> + struct rb_node *parent = NULL;
> + int warned = false;
> +
> + if (!name)
> + name = dso->long_name;
> + /*
> + * Find node with the matching name
> + */
> + while (*p) {
> + struct dso *this = rb_entry(*p, struct dso, long_name_rb_node);
> + long rc = (long)strcmp(name, this->long_name);
> +
> + parent = *p;
> + if (rc == 0) {
> + /*
> + * In case the new DSO is a duplicate of an existing
> + * one, print an one-time warning & sort the entry
> + * by its DSO address.
> + */
> + if (!dso || (dso == this))
> + return this; /* Find matching dso */
> + if (!warned) {
> + pr_warning("Duplicated dso long name: %s\n",
> + name);
> + warned = true;
> + }
> + rc = (long)dso - (long)this;
> + }
> + if (rc < 0)
> + p = &parent->rb_left;
> + else
> + p = &parent->rb_right;
> + }
> + if (dso) {
> + /* Add new node and rebalance tree */
> + rb_link_node(&dso->long_name_rb_node, parent, p);
> + rb_insert_color(&dso->long_name_rb_node, &dso__long_name_root);
> + }
> + return NULL;
> +}
> +
> +static inline void dso__long_name_remove_node(struct dso *dso)
> +{
> + rb_erase(&dso->long_name_rb_node, &dso__long_name_root);
> +}
> +
> void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
> {
> if (name == NULL)
> return;
>
> + if (dso->long_name) {
> + if (!strcmp(dso->long_name, name))
> + return;
> + dso__long_name_remove_node(dso);
> + }
> +
> if (dso->long_name_allocated)
> free((char *)dso->long_name);
>
> dso->long_name = name;
> dso->long_name_len = strlen(name);
> dso->long_name_allocated = name_allocated;
> + (void)dso__long_name_findadd_node(dso, name);
> }
>
> void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated)
> @@ -695,6 +761,8 @@ struct dso *dso__new(const char *name)
> if (dso != NULL) {
> int i;
> strcpy(dso->name, name);
> + RB_CLEAR_NODE(&dso->long_name_rb_node);
> + dso->long_name = NULL;
> dso__set_long_name(dso, dso->name, false);
> dso__set_short_name(dso, dso->name, false);
> for (i = 0; i < MAP__NR_TYPES; ++i)
> @@ -733,6 +801,10 @@ void dso__delete(struct dso *dso)
> zfree((char **)&dso->long_name);
> dso->long_name_allocated = false;
> }
> + if (dso->long_name) {
> + dso__long_name_remove_node(dso);
> + dso->long_name = NULL;
> + }
>
> dso__data_close(dso);
> dso_cache__free(&dso->data.cache);
> @@ -822,10 +894,7 @@ struct dso *dsos__find(const struct list_head *head, const char *name, bool cmp_
> return pos;
> return NULL;
> }
> - list_for_each_entry(pos, head, node)
> - if (strcmp(pos->long_name, name) == 0)
> - return pos;
> - return NULL;
> + return dso__long_name_findadd_node(NULL, name);
By its name, dsos__find() should not add anything to any data structure,
it is about just finding something, or it would be named
dsos__findnew().
Also would we want to add something if we don't even have a DSO here?
I think the right thing is to call it dsos__find_by_longname() and have
a dsos__findnew_by_longname().
If you want to share code behind that api, probably there are
opportunities for that, but doing it at this level makes the code
unecessarily hard to follow :-\
- Arnaldo
> struct dso *__dsos__findnew(struct list_head *head, const char *name)
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index ad553ba..ed949e4 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -76,6 +76,7 @@ struct dso {
> struct list_head node;
> struct rb_root symbols[MAP__NR_TYPES];
> struct rb_root symbol_names[MAP__NR_TYPES];
> + struct rb_node long_name_rb_node;
> void *a2l;
> char *symsrc_filename;
> unsigned int a2l_fails;
> --
> 1.7.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] perf mem: improves DSO long names search speed with RB tree
2014-09-15 20:15 ` Arnaldo Carvalho de Melo
@ 2014-09-16 18:39 ` Waiman Long
0 siblings, 0 replies; 3+ messages in thread
From: Waiman Long @ 2014-09-16 18:39 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, linux-kernel,
Scott J Norton, Don Zickus, Jiri Olsa, Adrian Hunter
On 09/15/2014 04:15 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 15, 2014 at 12:43:21PM -0400, Waiman Long escreveu:
>> With workload that spawns and destroys many threads and processes,
>> it was found that perf-mem could took a long time to post-process
>> the perf data after the target workload had completed its operation.
>> The performance bottleneck was found to be searching and insertion
>> of the new DSO structures (thousands of them in this case).
>>
>> In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread),
>> the perf profile below shows what perf was doing after the profiled
>> AIM7 shared workload completed:
>>
>> - 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
>> - __strcmp_sse42
>> - 99.82% map__new
>> machine__process_mmap_event
>> perf_session_deliver_event
>> perf_session__process_event
>> __perf_session__process_events
>> cmd_record
>> cmd_mem
>> run_builtin
>> main
>> __libc_start_main
>> - 13.17% perf perf [.] __dsos__findnew
>> __dsos__findnew
>> map__new
>> machine__process_mmap_event
>> perf_session_deliver_event
>> perf_session__process_event
>> __perf_session__process_events
>> cmd_record
>> cmd_mem
>> run_builtin
>> main
>> __libc_start_main
>>
>> So about 97% of CPU times were spent in the map__new() function
>> trying to insert new DSO entry into the DSO linked list. The whole
>> post-processing step took about 9 minutes.
>>
>> The DSO structures are currently searched linearly. So the total
>> processing time will be proportional to n^2.
>>
>> To overcome this performance problem, the DSO code is modified to
>> put the DSO structures in a RB tree sorted by its long name. With
>> this change, the processing time will become proportional to n*log(n)
>> which will be much quicker for large n. However, the short name will
>> still be searched using the old linear searching method which is slow.
>> With that patch in place, the same perf-mem post-processing step took
>> less than 30 seconds to complete.
>>
>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
>> ---
>> tools/perf/util/dso.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
>> tools/perf/util/dso.h | 1 +
>> 2 files changed, 74 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
>> index 819f104..bd92564 100644
>> --- a/tools/perf/util/dso.c
>> +++ b/tools/perf/util/dso.c
>> @@ -611,17 +611,83 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
>> return dso;
>> }
>>
>> +/*
>> + * RB root of DSOs sorted by the long name
>> + */
>> +static struct rb_root dso__long_name_root = { NULL };
>> +
>> +/*
>> + * Either one of the dso or name parameter must be non-NULL or the
>> + * function will not work.
>> + */
>> +static struct dso *
>> +dso__long_name_findadd_node(struct dso *dso, const char *name)
>> +{
>> + struct rb_node **p =&dso__long_name_root.rb_node;
>> + struct rb_node *parent = NULL;
>> + int warned = false;
>> +
>> + if (!name)
>> + name = dso->long_name;
>> + /*
>> + * Find node with the matching name
>> + */
>> + while (*p) {
>> + struct dso *this = rb_entry(*p, struct dso, long_name_rb_node);
>> + long rc = (long)strcmp(name, this->long_name);
>> +
>> + parent = *p;
>> + if (rc == 0) {
>> + /*
>> + * In case the new DSO is a duplicate of an existing
>> + * one, print an one-time warning& sort the entry
>> + * by its DSO address.
>> + */
>> + if (!dso || (dso == this))
>> + return this; /* Find matching dso */
>> + if (!warned) {
>> + pr_warning("Duplicated dso long name: %s\n",
>> + name);
>> + warned = true;
>> + }
>> + rc = (long)dso - (long)this;
>> + }
>> + if (rc< 0)
>> + p =&parent->rb_left;
>> + else
>> + p =&parent->rb_right;
>> + }
>> + if (dso) {
>> + /* Add new node and rebalance tree */
>> + rb_link_node(&dso->long_name_rb_node, parent, p);
>> + rb_insert_color(&dso->long_name_rb_node,&dso__long_name_root);
>> + }
>> + return NULL;
>> +}
>> +
>> +static inline void dso__long_name_remove_node(struct dso *dso)
>> +{
>> + rb_erase(&dso->long_name_rb_node,&dso__long_name_root);
>> +}
>> +
>> void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
>> {
>> if (name == NULL)
>> return;
>>
>> + if (dso->long_name) {
>> + if (!strcmp(dso->long_name, name))
>> + return;
>> + dso__long_name_remove_node(dso);
>> + }
>> +
>> if (dso->long_name_allocated)
>> free((char *)dso->long_name);
>>
>> dso->long_name = name;
>> dso->long_name_len = strlen(name);
>> dso->long_name_allocated = name_allocated;
>> + (void)dso__long_name_findadd_node(dso, name);
>> }
>>
>> void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated)
>> @@ -695,6 +761,8 @@ struct dso *dso__new(const char *name)
>> if (dso != NULL) {
>> int i;
>> strcpy(dso->name, name);
>> + RB_CLEAR_NODE(&dso->long_name_rb_node);
>> + dso->long_name = NULL;
>> dso__set_long_name(dso, dso->name, false);
>> dso__set_short_name(dso, dso->name, false);
>> for (i = 0; i< MAP__NR_TYPES; ++i)
>> @@ -733,6 +801,10 @@ void dso__delete(struct dso *dso)
>> zfree((char **)&dso->long_name);
>> dso->long_name_allocated = false;
>> }
>> + if (dso->long_name) {
>> + dso__long_name_remove_node(dso);
>> + dso->long_name = NULL;
>> + }
>>
>> dso__data_close(dso);
>> dso_cache__free(&dso->data.cache);
>> @@ -822,10 +894,7 @@ struct dso *dsos__find(const struct list_head *head, const char *name, bool cmp_
>> return pos;
>> return NULL;
>> }
>> - list_for_each_entry(pos, head, node)
>> - if (strcmp(pos->long_name, name) == 0)
>> - return pos;
>> - return NULL;
>> + return dso__long_name_findadd_node(NULL, name);
> By its name, dsos__find() should not add anything to any data structure,
> it is about just finding something, or it would be named
> dsos__findnew().
You are right. I am a bit sloppy with the function name. The
dsos__find() function will not add anything to any data structure with
this patch. I will separate the two different use of the
dso__long_name_findadd_node() function. The first use case is to find a
matching entry when DSO is NULL. The second use case is to link the DSO
structure to the appropriate place in the RB tree when DSO is not NULL.
> Also would we want to add something if we don't even have a DSO here?
Nothing will be added if DSO isn't there.
>
> I think the right thing is to call it dsos__find_by_longname() and have
> a dsos__findnew_by_longname().
>
> If you want to share code behind that api, probably there are
> opportunities for that, but doing it at this level makes the code
> unecessarily hard to follow :-\
>
> - Arnaldo
>
I will change the name to dsos__find_by_longname() as suggesed and
dsos__findlink_by_longname(). When DSO is defined, it will link it into
appropriate place in the tree, but allocation a new DSO structure. That
is why I am planning to use link instead of new.
Thanks for the comment.
Longman
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-09-16 18:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-15 16:43 [PATCH] perf mem: improves DSO long names search speed with RB tree Waiman Long
2014-09-15 20:15 ` Arnaldo Carvalho de Melo
2014-09-16 18:39 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).