All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Waiman Long <Waiman.Long@hp.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	Scott J Norton <scott.norton@hp.com>,
	Don Zickus <dzickus@redhat.com>, Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH v2] perf mem: improves DSO long names search speed with RB tree
Date: Wed, 17 Sep 2014 09:22:04 +0300	[thread overview]
Message-ID: <5419288C.5010403@intel.com> (raw)
In-Reply-To: <1410894499-27609-2-git-send-email-Waiman.Long@hp.com>

On 09/16/2014 10:08 PM, Waiman Long wrote:
> With workload that spawns and destroys many threads and processes,
> it was found that perf-mem could took a long time to post-process
> the perf data after the target workload had completed its operation.
> The performance bottleneck was found to be searching and insertion
> of the new DSO structures (thousands of them in this case).
> 
> In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread),
> the perf profile below shows what perf was doing after the profiled
> AIM7 shared workload completed:
> 
> -     83.94%  perf  libc-2.11.3.so     [.] __strcmp_sse42
>    - __strcmp_sse42
>       - 99.82% map__new
>            machine__process_mmap_event
>            perf_session_deliver_event
>            perf_session__process_event
>            __perf_session__process_events
>            cmd_record
>            cmd_mem
>            run_builtin
>            main
>            __libc_start_main
> -     13.17%  perf  perf               [.] __dsos__findnew
>      __dsos__findnew
>      map__new
>      machine__process_mmap_event
>      perf_session_deliver_event
>      perf_session__process_event
>      __perf_session__process_events
>      cmd_record
>      cmd_mem
>      run_builtin
>      main
>      __libc_start_main
> 
> So about 97% of CPU times were spent in the map__new() function
> trying to insert new DSO entry into the DSO linked list. The whole
> post-processing step took about 9 minutes.
> 
> The DSO structures are currently searched linearly. So the total
> processing time will be proportional to n^2.
> 
> To overcome this performance problem, the DSO code is modified to
> also put the DSO structures in a RB tree sorted by its long name
> in additional to being in a simple linked list. With this change,
> the processing time will become proportional to n*log(n) which will
> be much quicker for large n. However, the short name will still
> be searched using the old linear searching method which is slow.
> With that patch in place, the same perf-mem post-processing step took
> less than 30 seconds to complete.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
>  tools/perf/util/dso.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++--
>  tools/perf/util/dso.h |    1 +
>  2 files changed, 84 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index 819f104..fccb2f0 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -611,17 +611,93 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
>  	return dso;
>  }
>  
> +/*
> + * RB root of DSOs sorted by the long name
> + */
> +static struct rb_root dso__longname_root = { NULL };

Global variable!

Why not just change the lists to rbtrees i.e.

diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h                                                                                          
index 6a6bcc1..fa30780 100644                                                                                                                               
--- a/tools/perf/util/machine.h                                                                                                                             
+++ b/tools/perf/util/machine.h                                                                                                                             
@@ -32,8 +32,8 @@ struct machine {                                                                                                                          
        struct list_head  dead_threads;                                                                                                                     
        struct thread     *last_match;                                                                                                                      
        struct vdso_info  *vdso_info;                                                                                                                       
-       struct list_head  user_dsos;                                                                                                                        
-       struct list_head  kernel_dsos;                                                                                                                      
+       struct rb_root    user_dsos;                                                                                                                        
+       struct rb_root    kernel_dsos;                                                                                                                      
        struct map_groups kmaps;                                                                                                                            
        struct map        *vmlinux_maps[MAP__NR_TYPES];                                                                                                     
        u64               kernel_start;

And make all the resulting adjustments.


  reply	other threads:[~2014-09-17  6:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-16 19:08 [PATCH v2 0/1] perf mem: improves DSO long names search speed with RB tree Waiman Long
2014-09-16 19:08 ` [PATCH v2] " Waiman Long
2014-09-17  6:22   ` Adrian Hunter [this message]
2014-09-17 14:08     ` Waiman Long
2014-09-17 14:51   ` Arnaldo Carvalho de Melo
2014-09-17 17:55     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5419288C.5010403@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=Waiman.Long@hp.com \
    --cc=acme@kernel.org \
    --cc=dzickus@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.