Re: [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Kan Liang <kan.liang@intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andi Kleen <ak@linux.intel.com>, He Kuang <hekuang@huawei.com>,
	Lukasz Odzioba <lukasz.odzioba@intel.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Wang Nan <wangnan0@huawei.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads
Date: Tue, 3 Oct 2017 19:37:30 +0200	[thread overview]
Message-ID: <20171003173729.k5xobn24qerr2tuw@gmail.com> (raw)
In-Reply-To: <20171003125540.331-7-acme@kernel.org>


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> From: Kan Liang <kan.liang@intel.com>
> 
> The proc files which is sorted with alphabetical order are evenly
> assigned to several synthesize threads to be processed in parallel.
> 
> For 'perf top', the threads number hard code to online CPU number. The
> following patch will introduce an option to set it.
> 
> For other perf tools, the thread number is 1. Because the process
> function is not ready for multithreading, e.g.
> process_synthesized_event.
> 
> This patch series only support event synthesize multithreading for 'perf
> top'. For other tools, it can be done separately later.

Just to give some quick feedback: this is really nice stuff!

Is anyone working on multi-threading 'perf record' (and the recording portion of 
'perf top' perhaps)?

Especially with complex, high-frequency profiling there's alot of SMP overhead 
coming from a single recording thread. If there was a single thread per CPU, and 
it truly only recorded the events from its own CPU, things would become a lot more 
scalable.

For example, if we measure the current overhead of perf record of a (limited) 
parallel kernel build:

  triton:~/tip> perf stat --no-inherit --pre "make clean >/dev/null 2>&1" perf record -F 10000 make -j kernel
  ...
  [ perf record: Captured and wrote 5.124 MB perf.data (108400 samples) ]

 Performance counter stats for 'perf record -F 10000 make -j kernel':

        183.582587      task-clock (msec)         #    0.039 CPUs utilized          
             2,496      context-switches          #    0.014 M/sec                  
               157      cpu-migrations            #    0.855 K/sec                  
             6,649      page-faults               #    0.036 M/sec                  
       817,478,151      cycles                    #    4.453 GHz                    
       416,641,913      stalled-cycles-frontend   #   50.97% frontend cycles idle   
     1,018,336,301      instructions              #    1.25  insn per cycle         
                                                  #    0.41  stalled cycles per insn
       217,255,137      branches                  # 1183.419 M/sec                  
         2,970,118      branch-misses             #    1.37% of all branches        

       4.710378510 seconds time elapsed

That's 1018336301 just to record 108400 samples, i.e. every sample takes 9,300 
instructions to _record_. That's insanely high overhead from what is in essence a 
tracing utility.


Even if I add "-B -N" to disable buildid generation (which is the worst offender), 
it's still very high overhead:

 [ perf record: Captured and wrote 5.585 MB perf.data ]

 Performance counter stats for 'perf record -B -N -F 10000 make -j kernel':

         45.625321      task-clock (msec)         #    0.009 CPUs utilized          
             2,950      context-switches          #    0.065 M/sec                  
               204      cpu-migrations            #    0.004 M/sec                  
             1,992      page-faults               #    0.044 M/sec                  
       193,127,853      cycles                    #    4.233 GHz                    
       117,098,418      stalled-cycles-frontend   #   60.63% frontend cycles idle   
       197,899,633      instructions              #    1.02  insn per cycle         
                                                  #    0.59  stalled cycles per insn
        41,221,863      branches                  #  903.487 M/sec                  
           502,158      branch-misses             #    1.22% of all branches        

       4.858962925 seconds time elapsed

... that's still 1,800+ instructions per event!

As a comparison, ftrace has a tracing overhead of less than 100 instructions per 
event.

Thanks,

	Ingo

next prev parent reply	other threads:[~2017-10-03 17:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 1/8] perf tests attr: Fix task term values Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 2/8] perf test attr: Fix python error on empty result Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 3/8] perf test attr: Fix ignored test case result Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 4/8] perf tools: Lock to protect namespaces and comm list Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 5/8] perf tools: Lock to protect comm_str rb tree Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads Arnaldo Carvalho de Melo
2017-10-03 17:37   ` Ingo Molnar [this message]
2017-10-03 12:55 ` [PATCH 7/8] perf top: Add option to set the number of thread for event synthesize Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 8/8] perf tests attr: Fix group stat tests Arnaldo Carvalho de Melo
2017-10-03 16:38 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171003173729.k5xobn24qerr2tuw@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=ast@kernel.org \
    --cc=hekuang@huawei.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lukasz.odzioba@intel.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).