* Re: AMD -missing perf stat metricgroup "pipeline"
2022-11-22 4:48 ` Sandipan Das
@ 2022-11-22 17:41 ` Jirka Hladky
0 siblings, 0 replies; 3+ messages in thread
From: Jirka Hladky @ 2022-11-22 17:41 UTC (permalink / raw)
To: Sandipan Das; +Cc: linux-perf-users, ravi.bangoria, ananth.narayan
(forgot to send the message in plain text mode - resending)
Hi Sandipan,
> For determining if a workload is backend-bound, the recommended
> method on Zen 4 is to use the pipeline utilization metrics. We are
> the process of providing similar metrics and metric groups through
> the perf JSON event files for Zen 4 and they will be out very soon.
This is great news - I'm looking forward to having it released! :-)
> The PPR for Genoa processors is available here:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
Thanks for sharing this it!
I could confirm that my workload is heavily backend bound - to 86%.
See [1]. That is exactly what I was looking for. It will be awesome
once it will become easily accessible via the pipeline metricgroup.
Thanks a lot!
Jirka
[1]
perf stat -e r100431EA0,r1004360A0,r4300C1,r430076 ./harmonic_series 0 1e9
Time elapsed: 2.44143 s
AVX512:
Sum 23.3799
Difference Sum - Formula -1.91847e-13
Time elapsed: 2.44154 s
Performance counter stats for './harmonic_series 0 1e9':
46,731,488,713 r100431EA0
22,145 r1004360A0
5,015,015,425 r4300C1
9,021,144,392 r430076
2.442290274 seconds time elapsed
2.437987000 seconds user
0.000000000 seconds sys
Total Dispatch Slots: Up to 6 instructions can be dispatched in one
cycle. 6 * Event[430076]
Retiring: Fraction of dispatch slots used by ops that retired:
Event[4300C1] / Total
Dispatch Slots
5/(9*6)*100 = 9%
Backend Bound: Fraction of dispatch slots that remained unused
because of backend stalls. Event[100431EA0] / Total Dispatch Slots
46.7/(9*6)*100 = 86%
On Tue, Nov 22, 2022 at 5:48 AM Sandipan Das <sandipan.das@amd.com> wrote:
>
> Hi,
>
> On 11/21/2022 7:33 PM, Jirka Hladky wrote:
> >
> > I'm testing AVX-512 packed double performance on the AMD Zen4
> > platform, and I need help identifying the backend-bound workloads. On
> > Intel systems, I use the metricgroup pipeline:
> >
> > perf stat -M pipeline binary
> >
> > which gives me exactly what I need.
> >
> > What plans are to add a similar metric group for the AMD systems?
> >
>
> For determining if a workload is backend-bound, the recommended
> method on Zen 4 is to use the pipeline utilization metrics. We are
> the process of providing similar metrics and metric groups through
> the perf JSON event files for Zen 4 and they will be out very soon.
>
> The Processor Programming Reference (PPR) for Zen 4 based parts
> has a table titled "Guidance for Pipeline Utilization Statistics"
> which has the formulae for different Level 1 and 2 pipeline
> utilization metrics.
>
> The PPR for Genoa processors is available here:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
>
> In this specific document, the table is in page 235 under section
> 2.1.15 titled "Performance Monitor Counters".
>
> It may not be convenient to find out if a workload is backend-bound
> without the use of a metric but one can still do it by programming
> the raw events that make up that metric.
>
> E.g. the formula for determining backend boundedness is:
> Event[100431EA0] / 6 * Event[430076]
>
> Running perf with the raw events give the counts which can then be
> used to calculate the metric.
>
> E.g.
>
> $ perf stat -e r100431EA0,r430076 ./test
>
> Performance counter stats for './test':
>
> 750,372 r100431EA0:u
> 7,500,728,022 r430076:u
>
> 2.894204814 seconds time elapsed
>
> 2.894060000 seconds user
> 0.000000000 seconds sys
>
> The backend boundedness is then 750372 / (6 * 7500728022)
> which is roughly 0.001667%.
>
> - Sandipan
>
--
-Jirka
^ permalink raw reply [flat|nested] 3+ messages in thread