Re: [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Kan Liang <kan.liang@intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andi Kleen <ak@linux.intel.com>, He Kuang <hekuang@huawei.com>,
	Lukasz Odzioba <lukasz.odzioba@intel.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Wang Nan <wangnan0@huawei.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads
Date: Tue, 3 Oct 2017 19:37:30 +0200	[thread overview]
Message-ID: <20171003173729.k5xobn24qerr2tuw@gmail.com> (raw)
In-Reply-To: <20171003125540.331-7-acme@kernel.org>


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> From: Kan Liang <kan.liang@intel.com>
> 
> The proc files which is sorted with alphabetical order are evenly
> assigned to several synthesize threads to be processed in parallel.
> 
> For 'perf top', the threads number hard code to online CPU number. The
> following patch will introduce an option to set it.
> 
> For other perf tools, the thread number is 1. Because the process
> function is not ready for multithreading, e.g.
> process_synthesized_event.
> 
> This patch series only support event synthesize multithreading for 'perf
> top'. For other tools, it can be done separately later.

Just to give some quick feedback: this is really nice stuff!

Is anyone working on multi-threading 'perf record' (and the recording portion of 
'perf top' perhaps)?

Especially with complex, high-frequency profiling there's alot of SMP overhead 
coming from a single recording thread. If there was a single thread per CPU, and 
it truly only recorded the events from its own CPU, things would become a lot more 
scalable.

For example, if we measure the current overhead of perf record of a (limited) 
parallel kernel build:

  triton:~/tip> perf stat --no-inherit --pre "make clean >/dev/null 2>&1" perf record -F 10000 make -j kernel
  ...
  [ perf record: Captured and wrote 5.124 MB perf.data (108400 samples) ]

 Performance counter stats for 'perf record -F 10000 make -j kernel':

        183.582587      task-clock (msec)         #    0.039 CPUs utilized          
             2,496      context-switches          #    0.014 M/sec                  
               157      cpu-migrations            #    0.855 K/sec                  
             6,649      page-faults               #    0.036 M/sec                  
       817,478,151      cycles                    #    4.453 GHz                    
       416,641,913      stalled-cycles-frontend   #   50.97% frontend cycles idle   
     1,018,336,301      instructions              #    1.25  insn per cycle         
                                                  #    0.41  stalled cycles per insn
       217,255,137      branches                  # 1183.419 M/sec                  
         2,970,118      branch-misses             #    1.37% of all branches        

       4.710378510 seconds time elapsed

That's 1018336301 just to record 108400 samples, i.e. every sample takes 9,300 
instructions to _record_. That's insanely high overhead from what is in essence a 
tracing utility.


Even if I add "-B -N" to disable buildid generation (which is the worst offender), 
it's still very high overhead:

 [ perf record: Captured and wrote 5.585 MB perf.data ]

 Performance counter stats for 'perf record -B -N -F 10000 make -j kernel':

         45.625321      task-clock (msec)         #    0.009 CPUs utilized          
             2,950      context-switches          #    0.065 M/sec                  
               204      cpu-migrations            #    0.004 M/sec                  
             1,992      page-faults               #    0.044 M/sec                  
       193,127,853      cycles                    #    4.233 GHz                    
       117,098,418      stalled-cycles-frontend   #   60.63% frontend cycles idle   
       197,899,633      instructions              #    1.02  insn per cycle         
                                                  #    0.59  stalled cycles per insn
        41,221,863      branches                  #  903.487 M/sec                  
           502,158      branch-misses             #    1.22% of all branches        

       4.858962925 seconds time elapsed

... that's still 1,800+ instructions per event!

As a comparison, ftrace has a tracing overhead of less than 100 instructions per 
event.

Thanks,

	Ingo

next prev parent reply	other threads:[~2017-10-03 17:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 1/8] perf tests attr: Fix task term values Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 2/8] perf test attr: Fix python error on empty result Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 3/8] perf test attr: Fix ignored test case result Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 4/8] perf tools: Lock to protect namespaces and comm list Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 5/8] perf tools: Lock to protect comm_str rb tree Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads Arnaldo Carvalho de Melo
2017-10-03 17:37   ` Ingo Molnar [this message]
2017-10-03 12:55 ` [PATCH 7/8] perf top: Add option to set the number of thread for event synthesize Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 8/8] perf tests attr: Fix group stat tests Arnaldo Carvalho de Melo
2017-10-03 16:38 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171003173729.k5xobn24qerr2tuw@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=ast@kernel.org \
    --cc=hekuang@huawei.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lukasz.odzioba@intel.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.