From: Ingo Molnar <mingo@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Kan Liang <kan.liang@intel.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Alexei Starovoitov <ast@kernel.org>,
Andi Kleen <ak@linux.intel.com>, He Kuang <hekuang@huawei.com>,
Lukasz Odzioba <lukasz.odzioba@intel.com>,
Namhyung Kim <namhyung@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Wang Nan <wangnan0@huawei.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads
Date: Tue, 3 Oct 2017 19:37:30 +0200 [thread overview]
Message-ID: <20171003173729.k5xobn24qerr2tuw@gmail.com> (raw)
In-Reply-To: <20171003125540.331-7-acme@kernel.org>
* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> The proc files which is sorted with alphabetical order are evenly
> assigned to several synthesize threads to be processed in parallel.
>
> For 'perf top', the threads number hard code to online CPU number. The
> following patch will introduce an option to set it.
>
> For other perf tools, the thread number is 1. Because the process
> function is not ready for multithreading, e.g.
> process_synthesized_event.
>
> This patch series only support event synthesize multithreading for 'perf
> top'. For other tools, it can be done separately later.
Just to give some quick feedback: this is really nice stuff!
Is anyone working on multi-threading 'perf record' (and the recording portion of
'perf top' perhaps)?
Especially with complex, high-frequency profiling there's alot of SMP overhead
coming from a single recording thread. If there was a single thread per CPU, and
it truly only recorded the events from its own CPU, things would become a lot more
scalable.
For example, if we measure the current overhead of perf record of a (limited)
parallel kernel build:
triton:~/tip> perf stat --no-inherit --pre "make clean >/dev/null 2>&1" perf record -F 10000 make -j kernel
...
[ perf record: Captured and wrote 5.124 MB perf.data (108400 samples) ]
Performance counter stats for 'perf record -F 10000 make -j kernel':
183.582587 task-clock (msec) # 0.039 CPUs utilized
2,496 context-switches # 0.014 M/sec
157 cpu-migrations # 0.855 K/sec
6,649 page-faults # 0.036 M/sec
817,478,151 cycles # 4.453 GHz
416,641,913 stalled-cycles-frontend # 50.97% frontend cycles idle
1,018,336,301 instructions # 1.25 insn per cycle
# 0.41 stalled cycles per insn
217,255,137 branches # 1183.419 M/sec
2,970,118 branch-misses # 1.37% of all branches
4.710378510 seconds time elapsed
That's 1018336301 just to record 108400 samples, i.e. every sample takes 9,300
instructions to _record_. That's insanely high overhead from what is in essence a
tracing utility.
Even if I add "-B -N" to disable buildid generation (which is the worst offender),
it's still very high overhead:
[ perf record: Captured and wrote 5.585 MB perf.data ]
Performance counter stats for 'perf record -B -N -F 10000 make -j kernel':
45.625321 task-clock (msec) # 0.009 CPUs utilized
2,950 context-switches # 0.065 M/sec
204 cpu-migrations # 0.004 M/sec
1,992 page-faults # 0.044 M/sec
193,127,853 cycles # 4.233 GHz
117,098,418 stalled-cycles-frontend # 60.63% frontend cycles idle
197,899,633 instructions # 1.02 insn per cycle
# 0.59 stalled cycles per insn
41,221,863 branches # 903.487 M/sec
502,158 branch-misses # 1.22% of all branches
4.858962925 seconds time elapsed
... that's still 1,800+ instructions per event!
As a comparison, ftrace has a tracing overhead of less than 100 instructions per
event.
Thanks,
Ingo
next prev parent reply other threads:[~2017-10-03 17:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-03 12:55 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 1/8] perf tests attr: Fix task term values Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 2/8] perf test attr: Fix python error on empty result Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 3/8] perf test attr: Fix ignored test case result Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 4/8] perf tools: Lock to protect namespaces and comm list Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 5/8] perf tools: Lock to protect comm_str rb tree Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 6/8] perf top: Implement multithreading for perf_event__synthesize_threads Arnaldo Carvalho de Melo
2017-10-03 17:37 ` Ingo Molnar [this message]
2017-10-03 12:55 ` [PATCH 7/8] perf top: Add option to set the number of thread for event synthesize Arnaldo Carvalho de Melo
2017-10-03 12:55 ` [PATCH 8/8] perf tests attr: Fix group stat tests Arnaldo Carvalho de Melo
2017-10-03 16:38 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171003173729.k5xobn24qerr2tuw@gmail.com \
--to=mingo@kernel.org \
--cc=acme@kernel.org \
--cc=acme@redhat.com \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=ast@kernel.org \
--cc=hekuang@huawei.com \
--cc=kan.liang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=lukasz.odzioba@intel.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=wangnan0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.