AMD -missing perf stat metricgroup "pipeline"

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* AMD -missing perf stat metricgroup "pipeline"
@ 2022-11-21 14:03 Jirka Hladky
  2022-11-22  4:48 ` Sandipan Das
  0 siblings, 1 reply; 3+ messages in thread
From: Jirka Hladky @ 2022-11-21 14:03 UTC (permalink / raw)
  To: linux-perf-users; +Cc: ravi.bangoria

Hi,

I'm testing AVX-512 packed double performance on the AMD Zen4
platform, and I need help identifying the backend-bound workloads. On
Intel systems, I use the metricgroup pipeline:

perf stat -M pipeline binary

which gives me exactly what I need.

What plans are to add a similar metric group for the AMD systems?

Thanks a lot!
Jirka

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: AMD -missing perf stat metricgroup "pipeline"
  2022-11-21 14:03 AMD -missing perf stat metricgroup "pipeline" Jirka Hladky
@ 2022-11-22  4:48 ` Sandipan Das
  2022-11-22 17:41   ` Jirka Hladky
  0 siblings, 1 reply; 3+ messages in thread
From: Sandipan Das @ 2022-11-22  4:48 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-perf-users, ravi.bangoria, ananth.narayan

Hi,

On 11/21/2022 7:33 PM, Jirka Hladky wrote:
> 
> I'm testing AVX-512 packed double performance on the AMD Zen4
> platform, and I need help identifying the backend-bound workloads. On
> Intel systems, I use the metricgroup pipeline:
> 
> perf stat -M pipeline binary
> 
> which gives me exactly what I need.
> 
> What plans are to add a similar metric group for the AMD systems?
> 

For determining if a workload is backend-bound, the recommended
method on Zen 4 is to use the pipeline utilization metrics. We are
the process of providing similar metrics and metric groups through
the perf JSON event files for Zen 4 and they will be out very soon.

The Processor Programming Reference (PPR) for Zen 4 based parts
has a table titled "Guidance for Pipeline Utilization Statistics"
which has the formulae for different Level 1 and 2 pipeline
utilization metrics.

The PPR for Genoa processors is available here:
https://www.amd.com/system/files/TechDocs/55901_0.25.zip

In this specific document, the table is in page 235 under section
2.1.15 titled "Performance Monitor Counters".

It may not be convenient to find out if a workload is backend-bound
without the use of a metric but one can still do it by programming
the raw events that make up that metric.

E.g. the formula for determining backend boundedness is:
Event[100431EA0] / 6 * Event[430076]

Running perf with the raw events give the counts which can then be
used to calculate the metric.

E.g.

$ perf stat -e r100431EA0,r430076 ./test

 Performance counter stats for './test':

           750,372      r100431EA0:u
     7,500,728,022      r430076:u

       2.894204814 seconds time elapsed

       2.894060000 seconds user
       0.000000000 seconds sys

The backend boundedness is then 750372 / (6 * 7500728022)
which is roughly 0.001667%.

- Sandipan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: AMD -missing perf stat metricgroup "pipeline"
  2022-11-22  4:48 ` Sandipan Das
@ 2022-11-22 17:41   ` Jirka Hladky
  0 siblings, 0 replies; 3+ messages in thread
From: Jirka Hladky @ 2022-11-22 17:41 UTC (permalink / raw)
  To: Sandipan Das; +Cc: linux-perf-users, ravi.bangoria, ananth.narayan

(forgot to send the message in plain text mode - resending)

Hi Sandipan,

> For determining if a workload is backend-bound, the recommended
> method on Zen 4 is to use the pipeline utilization metrics. We are
> the process of providing similar metrics and metric groups through
> the perf JSON event files for Zen 4 and they will be out very soon.


This is great news - I'm looking forward to having it released! :-)

> The PPR for Genoa processors is available here:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip

Thanks for sharing this it!

I could confirm that my workload is heavily backend bound - to 86%.
See [1]. That is exactly what I was looking for. It will be awesome
once it will become easily accessible via the pipeline metricgroup.

Thanks a lot!
Jirka

[1]
perf stat -e r100431EA0,r1004360A0,r4300C1,r430076  ./harmonic_series 0 1e9
Time elapsed: 2.44143 s
AVX512:
Sum 23.3799
Difference Sum - Formula -1.91847e-13
Time elapsed: 2.44154 s

Performance counter stats for './harmonic_series 0 1e9':

   46,731,488,713      r100431EA0
           22,145      r1004360A0
    5,015,015,425      r4300C1
    9,021,144,392      r430076

      2.442290274 seconds time elapsed

      2.437987000 seconds user
      0.000000000 seconds sys

Total Dispatch Slots: Up to 6 instructions can be dispatched in one
cycle. 6 * Event[430076]

Retiring: Fraction of dispatch slots used by ops that retired:
Event[4300C1] / Total
Dispatch Slots
5/(9*6)*100 = 9%

Backend Bound: Fraction of dispatch slots that remained unused
because of backend stalls. Event[100431EA0] / Total Dispatch Slots
46.7/(9*6)*100 = 86%


On Tue, Nov 22, 2022 at 5:48 AM Sandipan Das <sandipan.das@amd.com> wrote:
>
> Hi,
>
> On 11/21/2022 7:33 PM, Jirka Hladky wrote:
> >
> > I'm testing AVX-512 packed double performance on the AMD Zen4
> > platform, and I need help identifying the backend-bound workloads. On
> > Intel systems, I use the metricgroup pipeline:
> >
> > perf stat -M pipeline binary
> >
> > which gives me exactly what I need.
> >
> > What plans are to add a similar metric group for the AMD systems?
> >
>
> For determining if a workload is backend-bound, the recommended
> method on Zen 4 is to use the pipeline utilization metrics. We are
> the process of providing similar metrics and metric groups through
> the perf JSON event files for Zen 4 and they will be out very soon.
>
> The Processor Programming Reference (PPR) for Zen 4 based parts
> has a table titled "Guidance for Pipeline Utilization Statistics"
> which has the formulae for different Level 1 and 2 pipeline
> utilization metrics.
>
> The PPR for Genoa processors is available here:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
>
> In this specific document, the table is in page 235 under section
> 2.1.15 titled "Performance Monitor Counters".
>
> It may not be convenient to find out if a workload is backend-bound
> without the use of a metric but one can still do it by programming
> the raw events that make up that metric.
>
> E.g. the formula for determining backend boundedness is:
> Event[100431EA0] / 6 * Event[430076]
>
> Running perf with the raw events give the counts which can then be
> used to calculate the metric.
>
> E.g.
>
> $ perf stat -e r100431EA0,r430076 ./test
>
>  Performance counter stats for './test':
>
>            750,372      r100431EA0:u
>      7,500,728,022      r430076:u
>
>        2.894204814 seconds time elapsed
>
>        2.894060000 seconds user
>        0.000000000 seconds sys
>
> The backend boundedness is then 750372 / (6 * 7500728022)
> which is roughly 0.001667%.
>
> - Sandipan
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-22 17:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-21 14:03 AMD -missing perf stat metricgroup "pipeline" Jirka Hladky
2022-11-22  4:48 ` Sandipan Das
2022-11-22 17:41   ` Jirka Hladky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).