All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hp.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>,
	Don Zickus <dzickus@redhat.com>, Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Namhyung Kim <namhyung@kernel.org>
Subject: Re: [PATCH v5 2/2] perf tool: improves DSO long names lookup speed with rbtree
Date: Tue, 30 Sep 2014 13:38:54 -0400	[thread overview]
Message-ID: <542AEAAE.5040002@hp.com> (raw)
In-Reply-To: <542ADF03.7070609@hp.com>

On 09/30/2014 12:49 PM, Waiman Long wrote:
> On 09/30/2014 11:21 AM, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Sep 29, 2014 at 04:07:29PM -0400, Waiman Long escreveu:
>>> With workload that spawns and destroys many threads and processes,
>>> it was found that perf-mem could took a long time to post-process
>>> the perf data after the target workload had completed its operation.
>>> The performance bottleneck was found to be the lookup and insertion
>>> of the new DSO structures (thousands of them in this case).
>>>
>>> In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread),
>>> the perf profile below shows what perf was doing after the profiled
>>> AIM7 shared workload completed:
>>>
>>> -     83.94%  perf  libc-2.11.3.so     [.] __strcmp_sse42
>>>     - __strcmp_sse42
>>>        - 99.82% map__new
>>>             machine__process_mmap_event
>>>             perf_session_deliver_event
>>>             perf_session__process_event
>>>             __perf_session__process_events
>>>             cmd_record
>>>             cmd_mem
>>>             run_builtin
>>>             main
>>>             __libc_start_main
>>> -     13.17%  perf  perf               [.] __dsos__findnew
>>>       __dsos__findnew
>>>       map__new
>>>       machine__process_mmap_event
>>>       perf_session_deliver_event
>>>       perf_session__process_event
>>>       __perf_session__process_events
>>>       cmd_record
>>>       cmd_mem
>>>       run_builtin
>>>       main
>>>       __libc_start_main
>>>
>>> So about 97% of CPU times were spent in the map__new() function
>>> trying to insert new DSO entry into the DSO linked list. The whole
>>> post-processing step took about 9 minutes.
>>>
>>> The DSO structures are currently searched linearly. So the total
>>> processing time will be proportional to n^2.
>>>
>>> To overcome this performance problem, the DSO code is modified to
>>> also put the DSO structures in a RB tree sorted by its long name
>>> in additional to being in a simple linked list. With this change,
>>> the processing time will become proportional to n*log(n) which will
>>> be much quicker for large n. However, the short name will still be
>>> searched using the old linear searching method.  With that patch
>>> in place, the same perf-mem post-processing step took less than 30
>>> seconds to complete.
>>>
>>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
>>> ---
>>>   tools/perf/util/dso.c     |   72 
>>> ++++++++++++++++++++++++++++++++++++++++++--
>>>   tools/perf/util/dso.h     |    1 +
>>>   tools/perf/util/machine.c |    1 +
>>>   tools/perf/util/machine.h |    4 ++-
>>>   4 files changed, 73 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
>>> index 901a58f..9a81c03 100644
>>> --- a/tools/perf/util/dso.c
>>> +++ b/tools/perf/util/dso.c
>>> @@ -653,6 +653,67 @@ struct dso *dso__kernel_findnew(struct machine 
>>> *machine, const char *name,
>>>       return dso;
>>>   }
>>>
>>> +/*
>>> + * Find a matching entry and/or link current entry to RB tree.
>>> + * Either one of the dso or name parameter must be non-NULL or the
>>> + * function will not work.
>>> + */
>>> +static struct dso *dso__findlink_by_longname(struct rb_root *root,
>>> +                         struct dso *dso, const char *name)
>>> +{
>>> +    struct rb_node **p =&root->rb_node;
>>> +    struct rb_node  *parent = NULL;
>>> +    int warned = false;
>>> +
>>> +    if (!name)
>>> +        name = dso->long_name;
>>> +    /*
>>> +     * Find node with the matching name
>>> +     */
>>> +    while (*p) {
>>> +        struct dso *this = rb_entry(*p, struct dso, rb_node);
>>> +        int rc = strcmp(name, this->long_name);
>>> +
>>> +        parent = *p;
>>> +        if (rc == 0) {
>>> +            /*
>>> +             * In case the new DSO is a duplicate of an existing
>>> +             * one, print an one-time warning&  put the new entry
>>> +             * at the end of the list of duplicates.
>>> +             */
>>> +            if (!dso || (dso == this))
>>> +                return this;    /* Find matching dso */
>>> +            /*
>>> +             * The core kernel DSOs may have duplicated long name.
>>> +             * (See dso__load_sym()). Don't print warning for them.
>>> +             */
>>> +            if (!warned&&  !strstr(name, "kernel.kallsyms")
>>> + &&  !strstr(name, "/vmlinux")) {
>>> +                pr_warning("Duplicated dso long name: %s\n",
>>> +                       name);
>>> +                warned = true;
>> I still wonder if in this case we should just return, i.e. why would we
>> want to have multiple entries with the same name here? Anyway, I guess
>> it doesn't hurt, right?
>>
>> Something to be further investigated to find a better solution, but I
>> guess that the patch as-is now should provide that speedup without
>> introducing any new oddities. Will apply.
>
> If I don't add the kernel name check, I will get a warning every time 
> I run mem recording with the workloads that I am using. So it is 
> happening in the current code. I think the short name may be 
> different. I will do more test to find out. If that is the case, an 
> alternative is to do a short name comparison if the long name match.
>

The short names are indeed different when the long names match. I have 
just sent out the v6 patch with the change. Hopefully that will address 
your remaining concern about this patch.

-Longman

      reply	other threads:[~2014-09-30 17:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-29 20:07 [PATCH v5 0/2] perf tool: improves DSO long names search speed with rbtree Waiman Long
2014-09-29 20:07 ` [PATCH v5 1/2] perf tool: encapsulate dsos list head into struct dsos Waiman Long
2014-10-03  5:26   ` [tip:perf/core] perf symbols: Encapsulate " tip-bot for Waiman Long
2014-09-29 20:07 ` [PATCH v5 2/2] perf tool: improves DSO long names lookup speed with rbtree Waiman Long
2014-09-30 15:21   ` Arnaldo Carvalho de Melo
2014-09-30 16:49     ` Waiman Long
2014-09-30 17:38       ` Waiman Long [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542AEAAE.5040002@hp.com \
    --to=waiman.long@hp.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=doug.hatch@hp.com \
    --cc=dzickus@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.