* AMD -missing perf stat metricgroup "pipeline"
@ 2022-11-21 14:03 Jirka Hladky
2022-11-22 4:48 ` Sandipan Das
0 siblings, 1 reply; 3+ messages in thread
From: Jirka Hladky @ 2022-11-21 14:03 UTC (permalink / raw)
To: linux-perf-users; +Cc: ravi.bangoria
Hi,
I'm testing AVX-512 packed double performance on the AMD Zen4
platform, and I need help identifying the backend-bound workloads. On
Intel systems, I use the metricgroup pipeline:
perf stat -M pipeline binary
which gives me exactly what I need.
What plans are to add a similar metric group for the AMD systems?
Thanks a lot!
Jirka
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: AMD -missing perf stat metricgroup "pipeline"
2022-11-21 14:03 AMD -missing perf stat metricgroup "pipeline" Jirka Hladky
@ 2022-11-22 4:48 ` Sandipan Das
2022-11-22 17:41 ` Jirka Hladky
0 siblings, 1 reply; 3+ messages in thread
From: Sandipan Das @ 2022-11-22 4:48 UTC (permalink / raw)
To: Jirka Hladky; +Cc: linux-perf-users, ravi.bangoria, ananth.narayan
Hi,
On 11/21/2022 7:33 PM, Jirka Hladky wrote:
>
> I'm testing AVX-512 packed double performance on the AMD Zen4
> platform, and I need help identifying the backend-bound workloads. On
> Intel systems, I use the metricgroup pipeline:
>
> perf stat -M pipeline binary
>
> which gives me exactly what I need.
>
> What plans are to add a similar metric group for the AMD systems?
>
For determining if a workload is backend-bound, the recommended
method on Zen 4 is to use the pipeline utilization metrics. We are
the process of providing similar metrics and metric groups through
the perf JSON event files for Zen 4 and they will be out very soon.
The Processor Programming Reference (PPR) for Zen 4 based parts
has a table titled "Guidance for Pipeline Utilization Statistics"
which has the formulae for different Level 1 and 2 pipeline
utilization metrics.
The PPR for Genoa processors is available here:
https://www.amd.com/system/files/TechDocs/55901_0.25.zip
In this specific document, the table is in page 235 under section
2.1.15 titled "Performance Monitor Counters".
It may not be convenient to find out if a workload is backend-bound
without the use of a metric but one can still do it by programming
the raw events that make up that metric.
E.g. the formula for determining backend boundedness is:
Event[100431EA0] / 6 * Event[430076]
Running perf with the raw events give the counts which can then be
used to calculate the metric.
E.g.
$ perf stat -e r100431EA0,r430076 ./test
Performance counter stats for './test':
750,372 r100431EA0:u
7,500,728,022 r430076:u
2.894204814 seconds time elapsed
2.894060000 seconds user
0.000000000 seconds sys
The backend boundedness is then 750372 / (6 * 7500728022)
which is roughly 0.001667%.
- Sandipan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: AMD -missing perf stat metricgroup "pipeline"
2022-11-22 4:48 ` Sandipan Das
@ 2022-11-22 17:41 ` Jirka Hladky
0 siblings, 0 replies; 3+ messages in thread
From: Jirka Hladky @ 2022-11-22 17:41 UTC (permalink / raw)
To: Sandipan Das; +Cc: linux-perf-users, ravi.bangoria, ananth.narayan
(forgot to send the message in plain text mode - resending)
Hi Sandipan,
> For determining if a workload is backend-bound, the recommended
> method on Zen 4 is to use the pipeline utilization metrics. We are
> the process of providing similar metrics and metric groups through
> the perf JSON event files for Zen 4 and they will be out very soon.
This is great news - I'm looking forward to having it released! :-)
> The PPR for Genoa processors is available here:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
Thanks for sharing this it!
I could confirm that my workload is heavily backend bound - to 86%.
See [1]. That is exactly what I was looking for. It will be awesome
once it will become easily accessible via the pipeline metricgroup.
Thanks a lot!
Jirka
[1]
perf stat -e r100431EA0,r1004360A0,r4300C1,r430076 ./harmonic_series 0 1e9
Time elapsed: 2.44143 s
AVX512:
Sum 23.3799
Difference Sum - Formula -1.91847e-13
Time elapsed: 2.44154 s
Performance counter stats for './harmonic_series 0 1e9':
46,731,488,713 r100431EA0
22,145 r1004360A0
5,015,015,425 r4300C1
9,021,144,392 r430076
2.442290274 seconds time elapsed
2.437987000 seconds user
0.000000000 seconds sys
Total Dispatch Slots: Up to 6 instructions can be dispatched in one
cycle. 6 * Event[430076]
Retiring: Fraction of dispatch slots used by ops that retired:
Event[4300C1] / Total
Dispatch Slots
5/(9*6)*100 = 9%
Backend Bound: Fraction of dispatch slots that remained unused
because of backend stalls. Event[100431EA0] / Total Dispatch Slots
46.7/(9*6)*100 = 86%
On Tue, Nov 22, 2022 at 5:48 AM Sandipan Das <sandipan.das@amd.com> wrote:
>
> Hi,
>
> On 11/21/2022 7:33 PM, Jirka Hladky wrote:
> >
> > I'm testing AVX-512 packed double performance on the AMD Zen4
> > platform, and I need help identifying the backend-bound workloads. On
> > Intel systems, I use the metricgroup pipeline:
> >
> > perf stat -M pipeline binary
> >
> > which gives me exactly what I need.
> >
> > What plans are to add a similar metric group for the AMD systems?
> >
>
> For determining if a workload is backend-bound, the recommended
> method on Zen 4 is to use the pipeline utilization metrics. We are
> the process of providing similar metrics and metric groups through
> the perf JSON event files for Zen 4 and they will be out very soon.
>
> The Processor Programming Reference (PPR) for Zen 4 based parts
> has a table titled "Guidance for Pipeline Utilization Statistics"
> which has the formulae for different Level 1 and 2 pipeline
> utilization metrics.
>
> The PPR for Genoa processors is available here:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
>
> In this specific document, the table is in page 235 under section
> 2.1.15 titled "Performance Monitor Counters".
>
> It may not be convenient to find out if a workload is backend-bound
> without the use of a metric but one can still do it by programming
> the raw events that make up that metric.
>
> E.g. the formula for determining backend boundedness is:
> Event[100431EA0] / 6 * Event[430076]
>
> Running perf with the raw events give the counts which can then be
> used to calculate the metric.
>
> E.g.
>
> $ perf stat -e r100431EA0,r430076 ./test
>
> Performance counter stats for './test':
>
> 750,372 r100431EA0:u
> 7,500,728,022 r430076:u
>
> 2.894204814 seconds time elapsed
>
> 2.894060000 seconds user
> 0.000000000 seconds sys
>
> The backend boundedness is then 750372 / (6 * 7500728022)
> which is roughly 0.001667%.
>
> - Sandipan
>
--
-Jirka
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-11-22 17:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-21 14:03 AMD -missing perf stat metricgroup "pipeline" Jirka Hladky
2022-11-22 4:48 ` Sandipan Das
2022-11-22 17:41 ` Jirka Hladky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).