Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ahmad Yasin <ahmad.yasin@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Andi Kleen <ak@linux.intel.com>,
	Perry Taylor <perry.taylor@intel.com>,
	Samantha Alt <samantha.alt@intel.com>,
	Caleb Biggers <caleb.biggers@intel.com>,
	Weilin Wang <weilin.wang@intel.com>,
	Edward Baker <edward.baker@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Florian Fischer <florian.fischer@muhq.space>,
	Rob Herring <robh@kernel.org>,
	Zhengjun Xing <zhengjun.xing@linux.intel.com>,
	John Garry <john.g.garry@oracle.com>,
	Kajol Jain <kjain@linux.ibm.com>,
	Sumanth Korikkar <sumanthk@linux.ibm.com>,
	Thomas Richter <tmricht@linux.ibm.com>,
	Tiezhu Yang <yangtiezhu@loongson.cn>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Leo Yan <leo.yan@linaro.org>,
	Yang Jihong <yangjihong1@huawei.com>,
	James Clark <james.clark@arm.com>,
	Suzuki Poulouse <suzuki.poulose@arm.com>,
	Kang Minchul <tegongkang@gmail.com>,
	Athira Rajeev <atrajeev@linux.vnet.ibm.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs
Date: Wed, 26 Apr 2023 09:53:45 -0400	[thread overview]
Message-ID: <bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com> (raw)
In-Reply-To: <20230426070050.1315519-1-irogers@google.com>



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
> 
> The first patches were previously posted to improve metrics here:
> "perf stat: Introduce skippable evsels"
> https://lore.kernel.org/all/20230414051922.3625666-1-irogers@google.com/
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> 
> Next are some general test improvements.
> 
> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
> 
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
> 
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code. They also bring in a new
> json option to not group events when matching a metricgroup, this
> helps reduce counter pressure for TopdownL1 and TopdownL2 metric
> groups. The updates to the script that updates the json are posted in:
> https://github.com/intel/perfmon/pull/73
> 
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
>  20 files changed, 631 insertions(+), 951 deletions(-)
> 
> There's specific detail with each patch, but for now here is the 6.3
> output followed by that from perf-tools-next with the patch series
> applied. The tool is running on an Alderlake CPU on an elderly 5.15
> kernel:
> 
> Events on hybrid that parse and pass tests:
> '''
> $ perf-6.3 version
> perf version 6.3.rc7.gb7bc77e2f2c7
> $ perf-6.3 test
> ...
>   6.1: Test event parsing                       : FAILED!
> ...
> $ perf test
> ...
>   6: Parse event definition strings             :
>   6.1: Test event parsing                       : Ok
>   6.2: Parsing of all PMU events from sysfs     : Ok
>   6.3: Parsing of given PMU events from sysfs   : Ok
>   6.4: Parsing of aliased events from sysfs     : Skip (no aliases in sysfs)
>   6.5: Parsing of aliased events                : Ok
>   6.6: Parsing of terms (event modifiers)       : Ok
> ...
> '''
> 
> No event/metric running with json metrics and TopdownL1 on both PMUs:
> '''
> $ perf-6.3 stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>          24,073.58 msec cpu-clock                        #   23.975 CPUs utilized             
>                350      context-switches                 #   14.539 /sec                      
>                 25      cpu-migrations                   #    1.038 /sec                      
>                 66      page-faults                      #    2.742 /sec                      
>         21,257,199      cpu_core/cycles/                 #  883.009 K/sec                     
>          2,162,192      cpu_atom/cycles/                 #   89.816 K/sec                     
>          6,679,379      cpu_core/instructions/           #  277.457 K/sec                     
>            753,197      cpu_atom/instructions/           #   31.287 K/sec                     
>          1,300,647      cpu_core/branches/               #   54.028 K/sec                     
>            148,652      cpu_atom/branches/               #    6.175 K/sec                     
>            117,429      cpu_core/branch-misses/          #    4.878 K/sec                     
>             14,396      cpu_atom/branch-misses/          #  598.000 /sec                      
>        123,097,644      cpu_core/slots/                  #    5.113 M/sec                     
>          9,241,207      cpu_core/topdown-retiring/       #      7.5% Retiring                 
>          8,903,288      cpu_core/topdown-bad-spec/       #      7.2% Bad Speculation          
>         66,590,029      cpu_core/topdown-fe-bound/       #     54.1% Frontend Bound           
>         38,397,500      cpu_core/topdown-be-bound/       #     31.2% Backend Bound            
>          3,294,283      cpu_core/topdown-heavy-ops/      #      2.7% Heavy Operations          #      4.8% Light Operations         
>          8,855,769      cpu_core/topdown-br-mispredict/  #      7.2% Branch Mispredict         #      0.0% Machine Clears           
>         57,695,714      cpu_core/topdown-fetch-lat/      #     46.9% Fetch Latency             #      7.2% Fetch Bandwidth          
>         12,823,926      cpu_core/topdown-mem-bound/      #     10.4% Memory Bound              #     20.8% Core Bound               
> 
>        1.004093622 seconds time elapsed
> 
> $ perf stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>          24,064.65 msec cpu-clock                        #   23.973 CPUs utilized             
>                384      context-switches                 #   15.957 /sec                      
>                 24      cpu-migrations                   #    0.997 /sec                      
>                 71      page-faults                      #    2.950 /sec                      
>         19,737,646      cpu_core/cycles/                 #  820.192 K/sec                     
>        122,018,505      cpu_atom/cycles/                 #    5.070 M/sec                       (63.32%)
>          7,636,653      cpu_core/instructions/           #  317.339 K/sec                     
>         16,266,629      cpu_atom/instructions/           #  675.955 K/sec                       (72.50%)
>          1,552,995      cpu_core/branches/               #   64.534 K/sec                     
>          3,208,143      cpu_atom/branches/               #  133.314 K/sec                       (72.50%)
>            132,151      cpu_core/branch-misses/          #    5.491 K/sec                     
>            547,285      cpu_atom/branch-misses/          #   22.742 K/sec                       (72.49%)
>         32,110,597      cpu_atom/TOPDOWN_RETIRING.ALL/   #    1.334 M/sec                     
>                                                   #     18.4 %  tma_bad_speculation      (72.48%)
>        228,006,765      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #    9.475 M/sec                     
>                                                   #     38.1 %  tma_frontend_bound       (72.47%)
>        225,866,251      cpu_atom/TOPDOWN_BE_BOUND.ALL/   #    9.386 M/sec                     
>                                                   #     37.7 %  tma_backend_bound      
>                                                   #     37.7 %  tma_backend_bound_aux    (72.73%)
>        119,748,254      cpu_atom/CPU_CLK_UNHALTED.CORE/  #    4.976 M/sec                     
>                                                   #      5.2 %  tma_retiring             (73.14%)
>         31,363,579      cpu_atom/TOPDOWN_RETIRING.ALL/   #    1.303 M/sec                       (73.37%)
>        227,907,321      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #    9.471 M/sec                       (63.95%)
>        228,803,268      cpu_atom/TOPDOWN_BE_BOUND.ALL/   #    9.508 M/sec                       (63.55%)
>        113,357,334      cpu_core/TOPDOWN.SLOTS/          #     30.5 %  tma_backend_bound      
>                                                   #      9.2 %  tma_retiring           
>                                                   #      8.7 %  tma_bad_speculation    
>                                                   #     51.6 %  tma_frontend_bound     
>         10,451,044      cpu_core/topdown-retiring/                                            
>          9,687,449      cpu_core/topdown-bad-spec/                                            
>         58,703,214      cpu_core/topdown-fe-bound/                                            
>         34,540,660      cpu_core/topdown-be-bound/                                            
>            154,902      cpu_core/INT_MISC.UOP_DROPPING/  #    6.437 K/sec                     
> 
>        1.003818397 seconds time elapsed
> '''

Thanks for the fixes. That should work for -M or --topdown options.
But I don't think the above output is better than the 6.3 for the
*default* of perf stat?

- The multiplexing in the atom core messes up the other events.
- The "M/sec" seems useless for the Topdown events.
- The tma_* is not a generic name.
  "Retiring" is much better than "tma_retiring" as a generic annotation.
  It should works for both X86 and Arm.

As the default, it's better to provide a clean and generic ouptput for
the end users.

If the users want to know more details, they can use -M or --topdown
options. The events/formats are expected to be different among ARCHs.

Also, there should be a bug for all atom Topdown events. They are
displayed twice.

Thanks,
Kan

next prev parent reply	other threads:[~2023-04-26 13:53 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26  7:00 [PATCH v1 00/40] Fix perf on Intel hybrid CPUs Ian Rogers
2023-04-26  7:00 ` [PATCH v1 01/40] perf stat: Introduce skippable evsels Ian Rogers
2023-04-26 23:26   ` Yasin, Ahmad
2023-04-27  0:37     ` Ian Rogers
2023-04-27  2:03       ` Ian Rogers
2023-04-27 18:52   ` Liang, Kan
2023-04-27 20:21     ` Ian Rogers
2023-04-27 21:00       ` Namhyung Kim
2023-04-27 21:09         ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 02/40] perf vendor events intel: Add alderlake metric constraints Ian Rogers
2023-04-26  7:00 ` [PATCH v1 03/40] perf vendor events intel: Add icelake " Ian Rogers
2023-04-27 19:06   ` Liang, Kan
2023-04-27 20:22     ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 04/40] perf vendor events intel: Add icelakex " Ian Rogers
2023-04-26  7:00 ` [PATCH v1 05/40] perf vendor events intel: Add sapphirerapids " Ian Rogers
2023-04-26  7:00 ` [PATCH v1 06/40] perf vendor events intel: Add tigerlake " Ian Rogers
2023-04-26  7:00 ` [PATCH v1 07/40] perf stat: Avoid segv on counter->name Ian Rogers
2023-04-27 19:11   ` Liang, Kan
2023-04-27 19:34     ` Arnaldo Carvalho de Melo
2023-04-26  7:00 ` [PATCH v1 08/40] perf test: Test more sysfs events Ian Rogers
2023-04-27 19:38   ` Liang, Kan
2023-04-27 20:23     ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 09/40] perf test: Use valid for PMU tests Ian Rogers
2023-04-27 19:39   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 10/40] perf test: Mask config then test Ian Rogers
2023-04-27 19:39   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 11/40] perf test: Test more with config_cache Ian Rogers
2023-04-27 19:40   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 12/40] perf test: Roundtrip name, don't assume 1 event per name Ian Rogers
2023-04-27 19:44   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 13/40] perf parse-events: Set attr.type to PMU type early Ian Rogers
2023-04-27 20:00   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 14/40] perf print-events: Avoid unnecessary strlist Ian Rogers
2023-04-27 20:01   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 15/40] perf parse-events: Avoid scanning PMUs before parsing Ian Rogers
2023-04-27 20:06   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 16/40] perf test: Validate events with hyphens in Ian Rogers
2023-04-27 20:08   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 17/40] perf evsel: Modify group pmu name for software events Ian Rogers
2023-04-27 20:12   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 18/40] perf test: Move x86 hybrid tests to arch/x86 Ian Rogers
2023-04-27 21:42   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 19/40] perf test x86 hybrid: Don't assume evlist order Ian Rogers
2023-04-26  7:00 ` [PATCH v1 20/40] perf parse-events: Support PMUs for legacy cache events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 21/40] perf parse-events: Wildcard " Ian Rogers
2023-04-26 10:11   ` James Clark
2023-04-27  5:50     ` Ian Rogers
2023-04-27 21:02       ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 22/40] perf print-events: Print legacy cache events for each PMU Ian Rogers
2023-04-26  7:00 ` [PATCH v1 23/40] perf parse-events: Support wildcards on raw events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 24/40] perf parse-events: Remove now unused hybrid logic Ian Rogers
2023-04-26  7:00 ` [PATCH v1 25/40] perf parse-events: Minor type safety cleanup Ian Rogers
2023-04-26  7:00 ` [PATCH v1 26/40] perf parse-events: Add pmu filter Ian Rogers
2023-04-26  7:00 ` [PATCH v1 27/40] perf stat: Make cputype filter generic Ian Rogers
2023-04-26  7:00 ` [PATCH v1 28/40] perf test: Add cputype testing to perf stat Ian Rogers
2023-04-26  7:00 ` [PATCH v1 29/40] perf test: Fix parse-events tests for >1 core PMU Ian Rogers
2023-04-26  7:00 ` [PATCH v1 30/40] perf parse-events: Support hardware events as terms Ian Rogers
2023-04-26  7:00 ` [PATCH v1 31/40] perf parse-events: Avoid error when assigning a term Ian Rogers
2023-04-26  7:00 ` [PATCH v1 32/40] perf parse-events: Avoid error when assigning a legacy cache term Ian Rogers
2023-04-26  7:00 ` [PATCH v1 33/40] perf parse-events: Don't auto merge hybrid wildcard events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 34/40] perf parse-events: Don't reorder atom cpu events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 35/40] perf metrics: Be PMU specific for referenced metrics Ian Rogers
2023-04-26  7:00 ` [PATCH v1 36/40] perf metric: Json flag to not group events if gathering a metric group Ian Rogers
2023-04-26  7:00 ` [PATCH v1 37/40] perf stat: Command line PMU metric filtering Ian Rogers
2023-04-26  7:00 ` [PATCH v1 38/40] perf vendor events intel: Correct alderlake metrics Ian Rogers
2023-04-26  7:00 ` [PATCH v1 39/40] perf jevents: Don't rewrite metrics across PMUs Ian Rogers
2023-04-26  7:00 ` [PATCH v1 40/40] perf metrics: Be PMU specific in event match Ian Rogers
2023-04-26 13:53 ` Liang, Kan [this message]
2023-04-26 21:09 ` [PATCH v1 00/40] Fix perf on Intel hybrid CPUs Arnaldo Carvalho de Melo
2023-04-26 21:33   ` Arnaldo Carvalho de Melo
2023-04-26 22:07     ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ahmad.yasin@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=atrajeev@linux.vnet.ibm.com \
    --cc=caleb.biggers@intel.com \
    --cc=edward.baker@intel.com \
    --cc=eranian@google.com \
    --cc=florian.fischer@muhq.space \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=john.g.garry@oracle.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=perry.taylor@intel.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=robh@kernel.org \
    --cc=samantha.alt@intel.com \
    --cc=sumanthk@linux.ibm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tegongkang@gmail.com \
    --cc=tmricht@linux.ibm.com \
    --cc=weilin.wang@intel.com \
    --cc=yangjihong1@huawei.com \
    --cc=yangtiezhu@loongson.cn \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.