From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>,
mingo@redhat.com, peterz@infradead.org, namhyung@kernel.org,
jolsa@kernel.org, adrian.hunter@intel.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com
Subject: Re: [PATCH V4 0/5] New metricgroup output in perf stat default mode
Date: Fri, 16 Jun 2023 10:39:39 -0300 [thread overview]
Message-ID: <ZIxmG1fCRDwn6mHw@kernel.org> (raw)
In-Reply-To: <4802536a-b2ae-d90b-7beb-49abd4db43fe@linux.intel.com>
Em Fri, Jun 16, 2023 at 09:26:26AM -0400, Liang, Kan escreveu:
>
>
> On 2023-06-16 1:59 a.m., Ian Rogers wrote:
> > On Thu, Jun 15, 2023 at 8:14 PM <kan.liang@linux.intel.com> wrote:
> >>
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> Changes since V3:
> >> - Move the full name (PMU + metricgroup name) generation from the metric
> >> code to the output code. (Ian)
> >> - Add default tags for Hisi hip08 L1 metrics (John)
> >> - Some patches have been merged. Drop them from the V4.
> >>
> >> Changes since V2:
> >> - Fixes memory leak (Ian)
> >> (Ian, I cannot reproduce the memory leak on all my machines. Please
> >> check whether the fix works on your side. Thanks.)
> >> - Add Reviewed-by tags for several patches.
> >>
> >> Changes since V1:
> >> - Remove EVSEL_EVENT_MASK and use the __evsel__match which is suggested
> >> by Ian.
> >> - Support TopdownL1 on both e-core and p-core of ADL in the default
> >> mode. (Ian)
> >> - Have separate patches for the modifications of metricgroup and output.
> >> (Ian)
> >> - Does 2nd sort for the Default metricgroup. Remove the logic of
> >> changing the associated metric event. (Ian)
> >> - Move all the metric related code to stat-shadow (Ian)
> >> - Move the commong functions between stat+csv_output and stat+std_output
> >> to the lib directory (Ian)
> >>
> >> In the default mode, the current output of the metricgroup include both
> >> events and metrics, which is not necessary and makes the output hard to
> >> read. Also, different ARCHs (even different generations of the ARCH) may
> >> have a different output format because of the different events in a
> >> metrics.
> >>
> >> The patch proposes a new output format which only outputting the value
> >> of each metric and the metricgroup name. It can brings a clean and
> >> consistent output format among ARCHs and generations.
> >>
> >> The patches 1-2 introduce the new metricgroup output.
> >>
> >> The patches 3-4 improve the tests to cover the default mode.
> >>
> >> The patch 5 update the event list for Hisi hip08.
> >>
> >> Here are some examples for the new output.
> >>
> >> STD output:
> >>
> >> On SPR
> >>
> >> perf stat -a sleep 1
> >>
> >> Performance counter stats for 'system wide':
> >>
> >> 226,054.13 msec cpu-clock # 224.588 CPUs utilized
> >> 932 context-switches # 4.123 /sec
> >> 224 cpu-migrations # 0.991 /sec
> >> 76 page-faults # 0.336 /sec
> >> 45,940,682 cycles # 0.000 GHz
> >> 36,676,047 instructions # 0.80 insn per cycle
> >> 7,044,516 branches # 31.163 K/sec
> >> 62,169 branch-misses # 0.88% of all branches
> >> TopdownL1 # 68.7 % tma_backend_bound
> >> # 3.1 % tma_bad_speculation
> >> # 13.0 % tma_frontend_bound
> >> # 15.2 % tma_retiring
> >> TopdownL2 # 2.7 % tma_branch_mispredicts
> >> # 19.6 % tma_core_bound
> >> # 4.8 % tma_fetch_bandwidth
> >> # 8.3 % tma_fetch_latency
> >> # 2.9 % tma_heavy_operations
> >> # 12.3 % tma_light_operations
> >> # 0.4 % tma_machine_clears
> >> # 49.1 % tma_memory_bound
> >>
> >> 1.006529767 seconds time elapsed
> >>
> >> perf stat -a sleep 1
> >>
> >> Performance counter stats for 'system wide':
> >>
> >> 32,127.99 msec cpu-clock # 31.992 CPUs utilized
> >> 240 context-switches # 7.470 /sec
> >> 32 cpu-migrations # 0.996 /sec
> >> 74 page-faults # 2.303 /sec
> >> 6,313,960 cpu_core/cycles/ # 0.000 GHz
> >> 257,711,907 cpu_atom/cycles/ # 0.008 GHz (54.18%)
> >> 4,477,162 cpu_core/instructions/ # 0.71 insn per cycle
> >> 37,721,481 cpu_atom/instructions/ # 5.97 insn per cycle (63.33%)
> >> 809,747 cpu_core/branches/ # 25.204 K/sec
> >> 6,621,226 cpu_atom/branches/ # 206.089 K/sec (63.32%)
> >> 39,667 cpu_core/branch-misses/ # 4.90% of all branches
> >> 1,032,146 cpu_atom/branch-misses/ # 127.47% of all branches (63.33%)
> >> TopdownL1 (cpu_core) # nan % tma_backend_bound
> >> # 0.0 % tma_bad_speculation
> >> # nan % tma_frontend_bound
> >> # nan % tma_retiring
> >> TopdownL1 (cpu_atom) # 13.6 % tma_bad_speculation (63.36%)
> >> # 41.1 % tma_frontend_bound (63.54%)
> >> # 39.2 % tma_backend_bound
> >> # 39.2 % tma_backend_bound_aux (63.93%)
> >> # 5.4 % tma_retiring (64.15%)
> >>
> >> 1.004244114 seconds time elapsed
> >>
> >> JSON output
> >>
> >> on SPR
> >>
> >> perf stat --json -a sleep 1
> >> {"counter-value" : "225904.823297", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 225904323425, "pcnt-running" : 100.00, "metric-value" : "224.456872", "metric-unit" : "CPUs utilized"}
> >> {"counter-value" : "986.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 225904108985, "pcnt-running" : 100.00, "metric-value" : "4.364670", "metric-unit" : "/sec"}
> >> {"counter-value" : "224.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 225904016141, "pcnt-running" : 100.00, "metric-value" : "0.991568", "metric-unit" : "/sec"}
> >> {"counter-value" : "76.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 225903913270, "pcnt-running" : 100.00, "metric-value" : "0.336425", "metric-unit" : "/sec"}
> >> {"counter-value" : "48433482.000000", "unit" : "", "event" : "cycles", "event-runtime" : 225903792732, "pcnt-running" : 100.00, "metric-value" : "0.000214", "metric-unit" : "GHz"}
> >> {"counter-value" : "38620409.000000", "unit" : "", "event" : "instructions", "event-runtime" : 225903657830, "pcnt-running" : 100.00, "metric-value" : "0.797391", "metric-unit" : "insn per cycle"}
> >> {"counter-value" : "7369473.000000", "unit" : "", "event" : "branches", "event-runtime" : 225903464328, "pcnt-running" : 100.00, "metric-value" : "32.622026", "metric-unit" : "K/sec"}
> >> {"counter-value" : "54747.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 225903234523, "pcnt-running" : 100.00, "metric-value" : "0.742889", "metric-unit" : "of all branches"}
> >> {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"}
> >> {"metric-value" : "69.950631", "metric-unit" : "% tma_backend_bound"}
> >> {"metric-value" : "2.771783", "metric-unit" : "% tma_bad_speculation"}
> >> {"metric-value" : "12.026074", "metric-unit" : "% tma_frontend_bound"}
> >> {"metric-value" : "15.251513", "metric-unit" : "% tma_retiring"}
> >> {"event-runtime" : 225902840555, "pcnt-running" : 100.00, "metricgroup" : "TopdownL2"}
> >> {"metric-value" : "2.351757", "metric-unit" : "% tma_branch_mispredicts"}
> >> {"metric-value" : "19.729771", "metric-unit" : "% tma_core_bound"}
> >> {"metric-value" : "4.555207", "metric-unit" : "% tma_fetch_bandwidth"}
> >> {"metric-value" : "7.470867", "metric-unit" : "% tma_fetch_latency"}
> >> {"metric-value" : "2.938808", "metric-unit" : "% tma_heavy_operations"}
> >> {"metric-value" : "12.312705", "metric-unit" : "% tma_light_operations"}
> >> {"metric-value" : "0.420026", "metric-unit" : "% tma_machine_clears"}
> >> {"metric-value" : "50.220860", "metric-unit" : "% tma_memory_bound"}
> >>
> >> On hybrid
> >>
> >> perf stat --json -a sleep 1
> >> {"counter-value" : "32131.530625", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 32131536951, "pcnt-running" : 100.00, "metric-value" : "31.992642", "metric-unit" : "CPUs utilized"}
> >> {"counter-value" : "328.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 32131525778, "pcnt-running" : 100.00, "metric-value" : "10.208042", "metric-unit" : "/sec"}
> >> {"counter-value" : "32.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 32131515104, "pcnt-running" : 100.00, "metric-value" : "0.995906", "metric-unit" : "/sec"}
> >> {"counter-value" : "353.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 32131501396, "pcnt-running" : 100.00, "metric-value" : "10.986094", "metric-unit" : "/sec"}
> >> {"counter-value" : "18685492.000000", "unit" : "", "event" : "cpu_core/cycles/", "event-runtime" : 16061585292, "pcnt-running" : 100.00, "metric-value" : "0.000582", "metric-unit" : "GHz"}
> >> {"counter-value" : "255620352.000000", "unit" : "", "event" : "cpu_atom/cycles/", "event-runtime" : 8690268422, "pcnt-running" : 54.00, "metric-value" : "0.007955", "metric-unit" : "GHz"}
> >> {"counter-value" : "15489913.000000", "unit" : "", "event" : "cpu_core/instructions/", "event-runtime" : 16061582200, "pcnt-running" : 100.00, "metric-value" : "0.828981", "metric-unit" : "insn per cycle"}
> >> {"counter-value" : "38790161.000000", "unit" : "", "event" : "cpu_atom/instructions/", "event-runtime" : 10163133324, "pcnt-running" : 63.00, "metric-value" : "2.075951", "metric-unit" : "insn per cycle"}
> >> {"counter-value" : "2908031.000000", "unit" : "", "event" : "cpu_core/branches/", "event-runtime" : 16061563416, "pcnt-running" : 100.00, "metric-value" : "90.503967", "metric-unit" : "K/sec"}
> >> {"counter-value" : "6814948.000000", "unit" : "", "event" : "cpu_atom/branches/", "event-runtime" : 10161711336, "pcnt-running" : 63.00, "metric-value" : "212.095343", "metric-unit" : "K/sec"}
> >> {"counter-value" : "97638.000000", "unit" : "", "event" : "cpu_core/branch-misses/", "event-runtime" : 16061535261, "pcnt-running" : 100.00, "metric-value" : "3.357530", "metric-unit" : "of all branches"}
> >> {"counter-value" : "1017066.000000", "unit" : "", "event" : "cpu_atom/branch-misses/", "event-runtime" : 10159971797, "pcnt-running" : 63.00, "metric-value" : "34.974386", "metric-unit" : "of all branches"}
> >> {"event-runtime" : 16061513607, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1 (cpu_core)"}
> >> {"metric-value" : "nan", "metric-unit" : "% tma_backend_bound"}
> >> {"metric-value" : "0.000000", "metric-unit" : "% tma_bad_speculation"}
> >> {"metric-value" : "nan", "metric-unit" : "% tma_frontend_bound"}
> >> {"metric-value" : "nan", "metric-unit" : "% tma_retiring"}
> >> {"event-runtime" : 10157398501, "pcnt-running" : 63.00, "metricgroup" : "TopdownL1 (cpu_atom)"}
> >> {"metric-value" : "13.719821", "metric-unit" : "% tma_bad_speculation"}
> >> {"event-runtime" : 10178698656, "pcnt-running" : 63.00, "metric-value" : "41.016738", "metric-unit" : "% tma_frontend_bound"}
> >> {"event-runtime" : 10240582902, "pcnt-running" : 63.00, "metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound"}
> >> {"metric-value" : "39.327764", "metric-unit" : "% tma_backend_bound_aux"}
> >> {"event-runtime" : 10284284920, "pcnt-running" : 64.00, "metric-value" : "5.374638", "metric-unit" : "% tma_retiring"}
> >>
> >> CSV output
> >>
> >> On SPR
> >>
> >> perf stat -x, -a sleep 1
> >> 225851.20,msec,cpu-clock,225850700108,100.00,224.431,CPUs utilized
> >> 976,,context-switches,225850504803,100.00,4.321,/sec
> >> 224,,cpu-migrations,225850410336,100.00,0.992,/sec
> >> 76,,page-faults,225850304155,100.00,0.337,/sec
> >> 52288305,,cycles,225850188531,100.00,0.000,GHz
> >> 37977214,,instructions,225850071251,100.00,0.73,insn per cycle
> >> 7299859,,branches,225849890722,100.00,32.322,K/sec
> >> 51102,,branch-misses,225849672536,100.00,0.70,of all branches
> >> ,225849327050,100.00,,,,TopdownL1
> >> ,,,,,70.1,% tma_backend_bound
> >> ,,,,,2.7,% tma_bad_speculation
> >> ,,,,,12.5,% tma_frontend_bound
> >> ,,,,,14.6,% tma_retiring
> >> ,225849327050,100.00,,,,TopdownL2
> >> ,,,,,2.3,% tma_branch_mispredicts
> >> ,,,,,19.6,% tma_core_bound
> >> ,,,,,4.6,% tma_fetch_bandwidth
> >> ,,,,,7.9,% tma_fetch_latency
> >> ,,,,,2.9,% tma_heavy_operations
> >> ,,,,,11.7,% tma_light_operations
> >> ,,,,,0.5,% tma_machine_clears
> >> ,,,,,50.5,% tma_memory_bound
> >>
> >> On Hybrid
> >>
> >> perf stat -x, -a sleep 1
> >> 32139.34,msec,cpu-clock,32139351409,100.00,32.001,CPUs utilized
> >> 225,,context-switches,32139342672,100.00,7.001,/sec
> >> 32,,cpu-migrations,32139337772,100.00,0.996,/sec
> >> 72,,page-faults,32139328384,100.00,2.240,/sec
> >> 6766433,,cpu_core/cycles/,16067551558,100.00,0.000,GHz
> >> 256500230,,cpu_atom/cycles/,8695757391,54.00,0.008,GHz
> >> 4688595,,cpu_core/instructions/,16067558976,100.00,0.69,insn per cycle
> >> 37487490,,cpu_atom/instructions/,10165193856,63.00,5.54,insn per cycle
> >> 845211,,cpu_core/branches/,16067540225,100.00,26.298,K/sec
> >> 6571193,,cpu_atom/branches/,10155940853,63.00,204.459,K/sec
> >> 41359,,cpu_core/branch-misses/,16067516493,100.00,4.89,of all branches
> >> 1020231,,cpu_atom/branch-misses/,10159363620,63.00,120.71,of all branches
> >> ,16067494476,100.00,,,,TopdownL1 (cpu_core)
> >> ,,,,,,% tma_backend_bound
> >> ,,,,,0.0,% tma_bad_speculation
> >> ,,,,,,% tma_frontend_bound
> >> ,,,,,,% tma_retiring
> >> ,10160989992,63.00,,,,TopdownL1 (cpu_atom)
> >> ,,,,,13.8,% tma_bad_speculation
> >> ,10188319019,63.00,,,41.3,% tma_frontend_bound
> >> ,10258326591,63.00,,,38.6,% tma_backend_bound
> >> ,,,,,38.6,% tma_backend_bound_aux
> >> ,10282689488,64.00,,,5.4,% tma_retiring
> >>
> >> Kan Liang (5):
> >> perf metrics: Sort the Default metricgroup
> >> perf stat: New metricgroup output for the default mode
> >> perf test: Move all the check functions of stat csv output to lib
> >> perf test: Add test case for the standard perf stat output
> >> perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics
> >
> > Just to be clear, I'm happy with this to be submitted having put
> > reviewed/acked-by on it.
> >
>
> Thanks Ian. Appreciate all your feedback and comments.
Applied,
- Arnaldo
prev parent reply other threads:[~2023-06-16 13:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-16 3:14 [PATCH V4 0/5] New metricgroup output in perf stat default mode kan.liang
2023-06-16 3:14 ` [PATCH V4 1/5] perf metrics: Sort the Default metricgroup kan.liang
2023-06-16 5:48 ` Ian Rogers
2023-06-16 3:14 ` [PATCH V4 2/5] perf stat: New metricgroup output for the default mode kan.liang
2023-06-16 5:56 ` Ian Rogers
2023-06-16 13:23 ` Liang, Kan
2023-06-16 3:14 ` [PATCH V4 3/5] perf test: Move all the check functions of stat csv output to lib kan.liang
2023-06-16 3:14 ` [PATCH V4 4/5] perf test: Add test case for the standard perf stat output kan.liang
2023-06-16 3:14 ` [PATCH V4 5/5] perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics kan.liang
2023-06-16 5:57 ` Ian Rogers
2023-06-16 13:48 ` John Garry
2023-06-16 5:59 ` [PATCH V4 0/5] New metricgroup output in perf stat default mode Ian Rogers
2023-06-16 13:26 ` Liang, Kan
2023-06-16 13:39 ` Arnaldo Carvalho de Melo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZIxmG1fCRDwn6mHw@kernel.org \
--to=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ahmad.yasin@intel.com \
--cc=ak@linux.intel.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.