All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <Waiman.Long@hp.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Stephane Eranian <eranian@google.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, "Chandramouleeswaran,
	Aswin" <aswin@hp.com>, "Norton, Scott J" <scott.norton@hp.com>
Subject: Re: [PATCH] perf: fix symbol processing bug and greatly improve performance
Date: Tue, 07 May 2013 10:40:22 -0400	[thread overview]
Message-ID: <51891256.2040402@hp.com> (raw)
In-Reply-To: <20130507093031.GA1076@krava.brq.redhat.com>

On 05/07/2013 05:30 AM, Jiri Olsa wrote:
> On Mon, May 06, 2013 at 09:43:53AM -0400, Waiman Long wrote:
>> When "perf record" was used on a large machine with a lot of CPUs,
>> the perf post-processing time could take a lot of minutes and even
>> hours depending on how large the resulting perf.data file was.
>>
>> While running AIM7 1500-user high_systime workload on a 80-core x86-64
>> system with a 3.9 kernel, the workload itself took about 2 minutes
>> to run and the perf.data file had a size of 1108.746 MB. However,
>> the post-processing step took more than 10 minutes.
>>
>> With a gprof-profiled perf binary, the time spent by perf was as
>> follows:
>>
>>    %   cumulative   self              self     total
>>   time   seconds   seconds    calls   s/call   s/call  name
>>   96.90    822.10   822.10   192156     0.00     0.00  dsos__find
>>    0.81    828.96     6.86 172089958     0.00     0.00  rb_next
>>    0.41    832.44     3.48 48539289     0.00     0.00  rb_erase
>>
>> So 97% (822 seconds) of the time was spent in a single dsos_find()
>> function. After analyzing the call-graph data below:
>>
>> -----------------------------------------------
>>                  0.00  822.12  192156/192156      map__new [6]
>> [7]     96.9    0.00  822.12  192156         vdso__dso_findnew [7]
>>                822.10    0.00  192156/192156      dsos__find [8]
>>                  0.01    0.00  192156/192156      dsos__add [62]
>>                  0.01    0.00  192156/192366      dso__new [61]
>>                  0.00    0.00       1/45282525     memdup [31]
>>                  0.00    0.00  192156/192230      dso__set_long_name [91]
>> -----------------------------------------------
>>                822.10    0.00  192156/192156      vdso__dso_findnew [7]
>> [8]     96.9  822.10    0.00  192156         dsos__find [8]
>> -----------------------------------------------
>>
>> It was found that the vdso__dso_findnew() function failed to locate
>> VDSO__MAP_NAME ("[vdso]") in the dso list and have to insert a new
>> entry at the end for 192156 times. This problem is due to the fact that
>> there are 2 types of name in the dso entry - short name and long name.
>> The initial dso__new() adds "[vdso]" to both the short and long names.
>> After that, vdso__dso_findnew() modifies the long name to something
>> like /tmp/perf-vdso.so-NoXkDj. The dsos__find() function only compares
>> the long name. As a result, the same vdso entry is duplicated many
>> time in the dso list. This bug increases memory consumption as well
>> as slows the symbol processing time to a crawl.
> hi,
> the issue is there and fix looks ok, thanks!
>
> though I'm not able to get vdso callchains to pop out
> even by investigating report with vdso heavy workload.
>
> I'll have a closer look..

The test machine that I used have RHEL 6.4 installed in it with a 
upstream 3.9 kernel layered on top. The kernel config is based on the 
6.4 configuration file with modification to enable the X2APIC option 
needed by the machine. Other than that, I didn't make too much 
modification to the base configuration. I used the "-a -s" option when 
running perf-record.

I don't think the vdso callchains were major part of the workload that I 
tested. I think it is the high number of CPU cores plus the high number 
of users (1500) that cause the performance bottleneck to surface. In a 
smaller machine, those bottlenecks may be much less noticeable. The vdso 
call-chain dominates the post-processsing time because of the need to 
search through the while DSO list for the vdso library which can grow to 
2M+ in my test case.

Regards,
Longman

  reply	other threads:[~2013-05-07 14:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-06 13:43 [PATCH] perf: fix symbol processing bug and greatly improve performance Waiman Long
2013-05-07  7:01 ` Ingo Molnar
2013-05-07 14:19   ` Waiman Long
2013-05-07 14:52     ` Waiman Long
2013-05-07 17:15     ` Ingo Molnar
2013-05-07  9:30 ` Jiri Olsa
2013-05-07 14:40   ` Waiman Long [this message]
2013-05-08 15:44   ` Waiman Long
2013-05-09 10:19     ` Jiri Olsa
2013-05-09 12:52       ` Ingo Molnar
2013-05-09 14:46       ` Waiman Long
2013-05-09 15:05         ` David Ahern
2013-05-09 15:31           ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51891256.2040402@hp.com \
    --to=waiman.long@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=aswin@hp.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=paulus@samba.org \
    --cc=scott.norton@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.