From: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
To: kan.liang@linux.intel.com
Cc: jolsa@redhat.com, peterz@infradead.org, mingo@redhat.com,
linux-kernel@vger.kernel.org, namhyung@kernel.org,
adrian.hunter@intel.com, mathieu.poirier@linaro.org,
ravi.bangoria@linux.ibm.com, alexey.budankov@linux.intel.com,
vitaly.slobodskoy@intel.com, pavel.gerasimov@intel.com,
mpe@ellerman.id.au, eranian@google.com, ak@linux.intel.com
Subject: Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
Date: Wed, 4 Mar 2020 10:33:02 -0300 [thread overview]
Message-ID: <20200304133302.GA12612@kernel.org> (raw)
In-Reply-To: <20200228163011.19358-1-kan.liang@linux.intel.com>
Em Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The kernel patches have been merged into linux-next.
> commit bbfd5e4fab63 ("perf/core: Add new branch sample type for HW
> index of raw branch records")
> commit db278b90c326 ("perf/x86/intel: Output LBR TOS information
> correctly")
I saw it landed in tip/perf/core, going thru this patchset now.
Thanks,
- Arnaldo
> Start from Haswell, Linux perf can utilize the existing Last Branch
> Record (LBR) facility to record call stack. However, the depth of the
> reconstructed LBR call stack limits to the number of LBR registers.
> E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
> That's because HW will overwrite the oldest LBR registers when it's
> full.
>
> However, the overwritten LBRs may still be retrieved from previous
> sample. At that moment, HW hasn't overwritten the LBR registers yet.
> Perf tools can stitch those overwritten LBRs on current call stacks to
> get a more complete call stack.
>
> To determine if LBRs can be stitched, the physical index of LBR
> registers is required. A new branch sample type is introduced to dump
> the physical index of the most recent LBR aka Top-of-Stack (TOS)
> information for perf tools.
> Patch 1 & 2 extend struct branch_stack to support the new branch sample
> type, PERF_SAMPLE_BRANCH_HW_INDEX.
>
> Since the output format of PERF_SAMPLE_BRANCH_STACK will be changed
> when the new branch sample type is set, an older version of perf tool
> may parse the perf.data incorrectly. Furthermore, there is no warning
> if this case happens. Because current perf header never check for
> unknown input bits in attr. Patch 3 adds check for event attr. (Can be
> merged independently.)
>
> Besides the physical index, the maximum number of LBRs is required as
> well. Patch 4 & 5 retrieve the capabilities information from sysfs
> and save them in perf header.
>
> Patch 6 & 7 implements the LBR stitching approach.
>
> Users can use the options introduced in patch 8-11 to enable the LBR
> stitching approach for perf report, script, top and c2c.
>
> Patch 12 adds a fast path for duplicate entries check. It benefits all
> call stack parsing, not just for stitch LBR call stack. It can be
> merged independently.
>
>
> The stitching approach base on LBR call stack technology. The known
> limitations of LBR call stack technology still apply to the approach,
> e.g. Exception handing such as setjmp/longjmp will have calls/returns
> not match.
> This approach is not full proof. There can be cases where it creates
> incorrect call stacks from incorrect matches. There is no attempt
> to validate any matches in another way. So it is not enabled by default.
> However in many common cases with call stack overflows it can recreate
> better call stacks than the default lbr call stack output. So if there
> are problems with LBR overflows this is a possible workaround.
>
> Regression:
> Users may collect LBR call stack on a machine with new perf tool and
> new kernel (support LBR TOS). However, they may parse the perf.data with
> old perf tool (not support LBR TOS). The old tool doesn't check
> attr.branch_sample_type. Users probably get incorrect information
> without any warning.
>
> Performance impact:
> The processing time may increase with the LBR stitching approach
> enabled. The impact depends on the increased depth of call stacks.
>
> For a simple test case tchain_edit with 43 depth of call stacks.
> perf record --call-graph lbr -- ./tchain_edit
> perf report --stitch-lbr
>
> Without --stitch-lbr, perf report only display 32 depth of call stacks.
> With --stitch-lbr, perf report can display all 43 depth of call stacks.
> The depth of call stacks increase 34.3%.
>
> Correspondingly, the processing time of perf report increases 39%,
> Without --stitch-lbr: 11.0 sec
> With --stitch-lbr: 15.3 sec
>
> The source code of tchain_edit.c is something similar as below.
> noinline void f43(void)
> {
> int i;
> for (i = 0; i < 10000;) {
>
> if(i%2)
> i++;
> else
> i++;
> }
> }
>
> noinline void f42(void)
> {
> int i;
> for (i = 0; i < 100; i++) {
> f43();
> f43();
> f43();
> }
> }
>
> noinline void f41(void)
> {
> int i;
> for (i = 0; i < 100; i++) {
> f42();
> f42();
> f42();
> }
> }
> noinline void f40(void)
> {
> f41();
> }
>
> ... ...
>
> noinline void f32(void)
> {
> f33();
> }
>
> noinline void f31(void)
> {
> int i;
>
> for (i = 0; i < 10000; i++) {
> if(i%2)
> i++;
> else
> i++;
> }
>
> f32();
> }
>
> noinline void f30(void)
> {
> f31();
> }
>
> ... ...
>
> noinline void f1(void)
> {
> f2();
> }
>
> int main()
> {
> f1();
> }
>
> Kan Liang (12):
> perf tools: Add hw_idx in struct branch_stack
> perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
> perf header: Add check for event attr
> perf pmu: Add support for PMU capabilities
> perf header: Support CPU PMU capabilities
> perf machine: Refine the function for LBR call stack reconstruction
> perf tools: Stitch LBR call stack
> perf report: Add option to enable the LBR stitching approach
> perf script: Add option to enable the LBR stitching approach
> perf top: Add option to enable the LBR stitching approach
> perf c2c: Add option to enable the LBR stitching approach
> perf hist: Add fast path for duplicate entries check approach
>
> tools/include/uapi/linux/perf_event.h | 8 +-
> tools/perf/Documentation/perf-c2c.txt | 11 +
> tools/perf/Documentation/perf-report.txt | 11 +
> tools/perf/Documentation/perf-script.txt | 11 +
> tools/perf/Documentation/perf-top.txt | 9 +
> .../Documentation/perf.data-file-format.txt | 16 +
> tools/perf/builtin-c2c.c | 6 +
> tools/perf/builtin-record.c | 3 +
> tools/perf/builtin-report.c | 6 +
> tools/perf/builtin-script.c | 76 ++--
> tools/perf/builtin-stat.c | 1 +
> tools/perf/builtin-top.c | 11 +
> tools/perf/tests/sample-parsing.c | 7 +-
> tools/perf/util/branch.h | 27 +-
> tools/perf/util/callchain.h | 12 +-
> tools/perf/util/cs-etm.c | 1 +
> tools/perf/util/env.h | 3 +
> tools/perf/util/event.h | 1 +
> tools/perf/util/evsel.c | 20 +-
> tools/perf/util/evsel.h | 6 +
> tools/perf/util/header.c | 147 ++++++
> tools/perf/util/header.h | 1 +
> tools/perf/util/hist.c | 26 +-
> tools/perf/util/intel-pt.c | 2 +
> tools/perf/util/machine.c | 424 +++++++++++++++---
> tools/perf/util/perf_event_attr_fprintf.c | 1 +
> tools/perf/util/pmu.c | 87 ++++
> tools/perf/util/pmu.h | 12 +
> .../scripting-engines/trace-event-python.c | 30 +-
> tools/perf/util/session.c | 8 +-
> tools/perf/util/sort.c | 2 +-
> tools/perf/util/sort.h | 2 +
> tools/perf/util/synthetic-events.c | 6 +-
> tools/perf/util/thread.c | 2 +
> tools/perf/util/thread.h | 34 ++
> tools/perf/util/top.h | 1 +
> 36 files changed, 900 insertions(+), 131 deletions(-)
>
> --
> 2.17.1
>
--
- Arnaldo
next prev parent reply other threads:[~2020-03-04 13:33 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
2020-03-04 13:49 ` Arnaldo Carvalho de Melo
2020-03-04 15:45 ` Arnaldo Carvalho de Melo
2020-03-04 16:07 ` Liang, Kan
2020-03-19 14:10 ` [tip: perf/core] tools headers UAPI: Update tools's copy of linux/perf_event.h tip-bot2 for Arnaldo Carvalho de Melo
2020-03-10 0:42 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack Arnaldo Carvalho de Melo
2020-03-10 12:53 ` Liang, Kan
2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-02-28 16:30 ` [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX kan.liang
2020-03-05 20:25 ` Arnaldo Carvalho de Melo
2020-03-05 21:02 ` Liang, Kan
2020-03-05 23:17 ` Arnaldo Carvalho de Melo
2020-03-19 14:10 ` [tip: perf/core] perf evsel: " tip-bot2 for Kan Liang
2020-02-28 16:30 ` [PATCH 03/12] perf header: Add check for event attr kan.liang
2020-03-19 14:10 ` [tip: perf/core] perf header: Add check for unexpected use of reserved membrs in " tip-bot2 for Kan Liang
2020-02-28 16:30 ` [PATCH 04/12] perf pmu: Add support for PMU capabilities kan.liang
2020-02-28 16:30 ` [PATCH 05/12] perf header: Support CPU " kan.liang
2020-02-28 16:30 ` [PATCH 06/12] perf machine: Refine the function for LBR call stack reconstruction kan.liang
2020-02-28 16:30 ` [PATCH 07/12] perf tools: Stitch LBR call stack kan.liang
2020-02-28 16:30 ` [PATCH 08/12] perf report: Add option to enable the LBR stitching approach kan.liang
2020-02-28 16:30 ` [PATCH 09/12] perf script: " kan.liang
2020-02-28 16:30 ` [PATCH 10/12] perf top: " kan.liang
2020-02-28 16:30 ` [PATCH 11/12] perf c2c: " kan.liang
2020-02-28 16:30 ` [PATCH 12/12] perf hist: Add fast path for duplicate entries check approach kan.liang
2020-03-04 13:33 ` Arnaldo Carvalho de Melo [this message]
2020-03-06 9:39 ` [PATCH 00/12] Stitch LBR call stack (Perf Tools) Jiri Olsa
2020-03-06 19:13 ` Liang, Kan
2020-03-06 20:06 ` Arnaldo Carvalho de Melo
2020-03-09 13:27 ` Arnaldo Carvalho de Melo
2020-03-09 13:42 ` Liang, Kan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200304133302.GA12612@kernel.org \
--to=arnaldo.melo@gmail.com \
--cc=adrian.hunter@intel.com \
--cc=ak@linux.intel.com \
--cc=alexey.budankov@linux.intel.com \
--cc=eranian@google.com \
--cc=jolsa@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.poirier@linaro.org \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=namhyung@kernel.org \
--cc=pavel.gerasimov@intel.com \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@linux.ibm.com \
--cc=vitaly.slobodskoy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.