linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/31] Add generated latest Intel events and metrics
@ 2022-07-22 22:32 Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 01/31] perf test: Avoid sysfs state affecting fake events Ian Rogers
                   ` (18 more replies)
  0 siblings, 19 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

The goal of this patch series is to align the json events for Intel
platforms with those generated by:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
This script takes the latest event json and TMA metrics from:
https://download.01.org/perfmon/ and adds to these metrics, in
particular uncore ones, from: https://github.com/intel/perfmon-metrics
The cpu_operating_frequency metric assumes the presence of the
system_tsc_freq literal posted/reviewed in:
https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/

Some fixes were needed to the script for generating the json and are
contained in this pull request:
https://github.com/intel/event-converter-for-linux-perf/pull/15

The json files were first downloaded before being used to generate the
perf json files. This fixes non-ascii characters for (R) and (TM) in
the source json files. This can be reproduced with:
$ download_and_gen.py --hermetic-download --outdir data
$ download_and_gen.py --url=file://`pwd`/data/01 --metrics-url=file://`pwd`/data/github

A minor correction is made in the generated json of:
tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
changing "\\Inbound\\" to just "Inbound" to avoid compilation errors
caused by \I.

The elkhartlake metrics file is basic and not generated by scripts. It
is retained here although it causes a difference from the generated
files.

The mapfile.csv is the third and final difference from the generated
version due to a bug in 01.org's models for icelake. The existing
models are preferred and retained.

As well as the #system_tsc_freq being necessary, a test change is
added here fixing an issue with fake PMU testing exposed in the
new/updated metrics.

Compared to the previous json, additional changes are the inclusion of
basic meteorlake events and the renaming of tremontx to
snowridgex. The new metrics contribute to the size, but a large
contribution is the inclusion of previously ungenerated and
experimental uncore events.

Ian Rogers (31):
  perf test: Avoid sysfs state affecting fake events
  perf vendor events: Update Intel broadwellx
  perf vendor events: Update Intel broadwell
  perf vendor events: Update Intel broadwellde
  perf vendor events: Update Intel alderlake
  perf vendor events: Update bonnell mapfile.csv
  perf vendor events: Update Intel cascadelakex
  perf vendor events: Update Intel elkhartlake
  perf vendor events: Update goldmont mapfile.csv
  perf vendor events: Update goldmontplus mapfile.csv
  perf vendor events: Update Intel haswell
  perf vendor events: Update Intel haswellx
  perf vendor events: Update Intel icelake
  perf vendor events: Update Intel icelakex
  perf vendor events: Update Intel ivybridge
  perf vendor events: Update Intel ivytown
  perf vendor events: Update Intel jaketown
  perf vendor events: Update Intel knightslanding
  perf vendor events: Add Intel meteorlake
  perf vendor events: Update Intel nehalemep
  perf vendor events: Update Intel nehalemex
  perf vendor events: Update Intel sandybridge
  perf vendor events: Update Intel sapphirerapids
  perf vendor events: Update Intel silvermont
  perf vendor events: Update Intel skylake
  perf vendor events: Update Intel skylakex
  perf vendor events: Update Intel snowridgex
  perf vendor events: Update Intel tigerlake
  perf vendor events: Update Intel westmereep-dp
  perf vendor events: Update Intel westmereep-sp
  perf vendor events: Update Intel westmereex

 .../arch/x86/alderlake/adl-metrics.json       |     4 +-
 .../pmu-events/arch/x86/alderlake/cache.json  |   178 +-
 .../arch/x86/alderlake/floating-point.json    |    19 +-
 .../arch/x86/alderlake/frontend.json          |    38 +-
 .../pmu-events/arch/x86/alderlake/memory.json |    40 +-
 .../pmu-events/arch/x86/alderlake/other.json  |    97 +-
 .../arch/x86/alderlake/pipeline.json          |   507 +-
 .../arch/x86/alderlake/uncore-other.json      |     2 +-
 .../arch/x86/alderlake/virtual-memory.json    |    63 +-
 .../pmu-events/arch/x86/bonnell/cache.json    |     2 +-
 .../arch/x86/bonnell/floating-point.json      |     2 +-
 .../pmu-events/arch/x86/bonnell/frontend.json |     2 +-
 .../pmu-events/arch/x86/bonnell/memory.json   |     2 +-
 .../pmu-events/arch/x86/bonnell/other.json    |     2 +-
 .../pmu-events/arch/x86/bonnell/pipeline.json |     2 +-
 .../arch/x86/bonnell/virtual-memory.json      |     2 +-
 .../arch/x86/broadwell/bdw-metrics.json       |   130 +-
 .../pmu-events/arch/x86/broadwell/cache.json  |     2 +-
 .../arch/x86/broadwell/floating-point.json    |     2 +-
 .../arch/x86/broadwell/frontend.json          |     2 +-
 .../pmu-events/arch/x86/broadwell/memory.json |     2 +-
 .../pmu-events/arch/x86/broadwell/other.json  |     2 +-
 .../arch/x86/broadwell/pipeline.json          |     2 +-
 .../arch/x86/broadwell/uncore-cache.json      |   152 +
 .../arch/x86/broadwell/uncore-other.json      |    82 +
 .../pmu-events/arch/x86/broadwell/uncore.json |   278 -
 .../arch/x86/broadwell/virtual-memory.json    |     2 +-
 .../arch/x86/broadwellde/bdwde-metrics.json   |   136 +-
 .../arch/x86/broadwellde/cache.json           |     2 +-
 .../arch/x86/broadwellde/floating-point.json  |     2 +-
 .../arch/x86/broadwellde/frontend.json        |     2 +-
 .../arch/x86/broadwellde/memory.json          |     2 +-
 .../arch/x86/broadwellde/other.json           |     2 +-
 .../arch/x86/broadwellde/pipeline.json        |     2 +-
 .../arch/x86/broadwellde/uncore-cache.json    |  3818 ++-
 .../arch/x86/broadwellde/uncore-memory.json   |  2867 +-
 .../arch/x86/broadwellde/uncore-other.json    |  1246 +
 .../arch/x86/broadwellde/uncore-power.json    |   492 +-
 .../arch/x86/broadwellde/virtual-memory.json  |     2 +-
 .../arch/x86/broadwellx/bdx-metrics.json      |   570 +-
 .../pmu-events/arch/x86/broadwellx/cache.json |    22 +-
 .../arch/x86/broadwellx/floating-point.json   |     9 +-
 .../arch/x86/broadwellx/frontend.json         |     2 +-
 .../arch/x86/broadwellx/memory.json           |    39 +-
 .../pmu-events/arch/x86/broadwellx/other.json |     2 +-
 .../arch/x86/broadwellx/pipeline.json         |     4 +-
 .../arch/x86/broadwellx/uncore-cache.json     |  3788 ++-
 .../x86/broadwellx/uncore-interconnect.json   |  1438 +-
 .../arch/x86/broadwellx/uncore-memory.json    |  2849 +-
 .../arch/x86/broadwellx/uncore-other.json     |  3252 ++
 .../arch/x86/broadwellx/uncore-power.json     |   437 +-
 .../arch/x86/broadwellx/virtual-memory.json   |     2 +-
 .../arch/x86/cascadelakex/cache.json          |     8 +-
 .../arch/x86/cascadelakex/clx-metrics.json    |   724 +-
 .../arch/x86/cascadelakex/floating-point.json |     2 +-
 .../arch/x86/cascadelakex/frontend.json       |     2 +-
 .../arch/x86/cascadelakex/other.json          |    63 +
 .../arch/x86/cascadelakex/pipeline.json       |    11 +
 .../arch/x86/cascadelakex/uncore-memory.json  |     9 +
 .../arch/x86/cascadelakex/uncore-other.json   |   697 +-
 .../arch/x86/cascadelakex/virtual-memory.json |     2 +-
 .../arch/x86/elkhartlake/cache.json           |   956 +-
 .../arch/x86/elkhartlake/floating-point.json  |    19 +-
 .../arch/x86/elkhartlake/frontend.json        |    34 +-
 .../arch/x86/elkhartlake/memory.json          |   388 +-
 .../arch/x86/elkhartlake/other.json           |   527 +-
 .../arch/x86/elkhartlake/pipeline.json        |   203 +-
 .../arch/x86/elkhartlake/virtual-memory.json  |   151 +-
 .../pmu-events/arch/x86/goldmont/cache.json   |     2 +-
 .../arch/x86/goldmont/floating-point.json     |     2 +-
 .../arch/x86/goldmont/frontend.json           |     2 +-
 .../pmu-events/arch/x86/goldmont/memory.json  |     2 +-
 .../arch/x86/goldmont/pipeline.json           |     2 +-
 .../arch/x86/goldmont/virtual-memory.json     |     2 +-
 .../arch/x86/goldmontplus/cache.json          |     2 +-
 .../arch/x86/goldmontplus/floating-point.json |     2 +-
 .../arch/x86/goldmontplus/frontend.json       |     2 +-
 .../arch/x86/goldmontplus/memory.json         |     2 +-
 .../arch/x86/goldmontplus/pipeline.json       |     2 +-
 .../arch/x86/goldmontplus/virtual-memory.json |     2 +-
 .../pmu-events/arch/x86/haswell/cache.json    |    78 +-
 .../arch/x86/haswell/floating-point.json      |     2 +-
 .../pmu-events/arch/x86/haswell/frontend.json |     2 +-
 .../arch/x86/haswell/hsw-metrics.json         |    85 +-
 .../pmu-events/arch/x86/haswell/memory.json   |    75 +-
 .../pmu-events/arch/x86/haswell/other.json    |     2 +-
 .../pmu-events/arch/x86/haswell/pipeline.json |     9 +-
 .../arch/x86/haswell/uncore-other.json        |     7 +-
 .../arch/x86/haswell/virtual-memory.json      |     2 +-
 .../pmu-events/arch/x86/haswellx/cache.json   |    44 +-
 .../arch/x86/haswellx/floating-point.json     |     2 +-
 .../arch/x86/haswellx/frontend.json           |     2 +-
 .../arch/x86/haswellx/hsx-metrics.json        |    85 +-
 .../pmu-events/arch/x86/haswellx/memory.json  |    52 +-
 .../pmu-events/arch/x86/haswellx/other.json   |     2 +-
 .../arch/x86/haswellx/pipeline.json           |     9 +-
 .../arch/x86/haswellx/uncore-cache.json       |  3779 ++-
 .../x86/haswellx/uncore-interconnect.json     |  1430 +-
 .../arch/x86/haswellx/uncore-memory.json      |  2839 +-
 .../arch/x86/haswellx/uncore-other.json       |  3170 ++
 .../arch/x86/haswellx/uncore-power.json       |   477 +-
 .../arch/x86/haswellx/virtual-memory.json     |     2 +-
 .../pmu-events/arch/x86/icelake/cache.json    |     8 +-
 .../arch/x86/icelake/floating-point.json      |     2 +-
 .../pmu-events/arch/x86/icelake/frontend.json |     2 +-
 .../arch/x86/icelake/icl-metrics.json         |   126 +-
 .../arch/x86/icelake/uncore-other.json        |    31 +
 .../arch/x86/icelake/virtual-memory.json      |     2 +-
 .../pmu-events/arch/x86/icelakex/cache.json   |    28 +-
 .../arch/x86/icelakex/floating-point.json     |     2 +-
 .../arch/x86/icelakex/frontend.json           |     2 +-
 .../arch/x86/icelakex/icx-metrics.json        |   691 +-
 .../pmu-events/arch/x86/icelakex/memory.json  |     6 +-
 .../pmu-events/arch/x86/icelakex/other.json   |    51 +-
 .../arch/x86/icelakex/pipeline.json           |    12 +
 .../arch/x86/icelakex/virtual-memory.json     |     2 +-
 .../pmu-events/arch/x86/ivybridge/cache.json  |     2 +-
 .../arch/x86/ivybridge/floating-point.json    |     2 +-
 .../arch/x86/ivybridge/frontend.json          |     2 +-
 .../arch/x86/ivybridge/ivb-metrics.json       |    94 +-
 .../pmu-events/arch/x86/ivybridge/memory.json |     2 +-
 .../pmu-events/arch/x86/ivybridge/other.json  |     2 +-
 .../arch/x86/ivybridge/pipeline.json          |     4 +-
 .../arch/x86/ivybridge/uncore-other.json      |     2 +-
 .../arch/x86/ivybridge/virtual-memory.json    |     2 +-
 .../pmu-events/arch/x86/ivytown/cache.json    |     2 +-
 .../arch/x86/ivytown/floating-point.json      |     2 +-
 .../pmu-events/arch/x86/ivytown/frontend.json |     2 +-
 .../arch/x86/ivytown/ivt-metrics.json         |    94 +-
 .../pmu-events/arch/x86/ivytown/memory.json   |     2 +-
 .../pmu-events/arch/x86/ivytown/other.json    |     2 +-
 .../arch/x86/ivytown/uncore-cache.json        |  3495 ++-
 .../arch/x86/ivytown/uncore-interconnect.json |  1750 +-
 .../arch/x86/ivytown/uncore-memory.json       |  1775 +-
 .../arch/x86/ivytown/uncore-other.json        |  2411 ++
 .../arch/x86/ivytown/uncore-power.json        |   696 +-
 .../arch/x86/ivytown/virtual-memory.json      |     2 +-
 .../pmu-events/arch/x86/jaketown/cache.json   |     2 +-
 .../arch/x86/jaketown/floating-point.json     |     2 +-
 .../arch/x86/jaketown/frontend.json           |     2 +-
 .../arch/x86/jaketown/jkt-metrics.json        |    11 +-
 .../pmu-events/arch/x86/jaketown/memory.json  |     2 +-
 .../pmu-events/arch/x86/jaketown/other.json   |     2 +-
 .../arch/x86/jaketown/pipeline.json           |    16 +-
 .../arch/x86/jaketown/uncore-cache.json       |  1960 +-
 .../x86/jaketown/uncore-interconnect.json     |   824 +-
 .../arch/x86/jaketown/uncore-memory.json      |   445 +-
 .../arch/x86/jaketown/uncore-other.json       |  1551 +
 .../arch/x86/jaketown/uncore-power.json       |   362 +-
 .../arch/x86/jaketown/virtual-memory.json     |     2 +-
 .../arch/x86/knightslanding/cache.json        |     2 +-
 .../x86/knightslanding/floating-point.json    |     2 +-
 .../arch/x86/knightslanding/frontend.json     |     2 +-
 .../arch/x86/knightslanding/memory.json       |     2 +-
 .../arch/x86/knightslanding/pipeline.json     |     2 +-
 .../x86/knightslanding/uncore-memory.json     |    42 -
 .../arch/x86/knightslanding/uncore-other.json |  3890 +++
 .../x86/knightslanding/virtual-memory.json    |     2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |    74 +-
 .../pmu-events/arch/x86/meteorlake/cache.json |   262 +
 .../arch/x86/meteorlake/frontend.json         |    24 +
 .../arch/x86/meteorlake/memory.json           |   185 +
 .../pmu-events/arch/x86/meteorlake/other.json |    46 +
 .../arch/x86/meteorlake/pipeline.json         |   254 +
 .../arch/x86/meteorlake/virtual-memory.json   |    46 +
 .../pmu-events/arch/x86/nehalemep/cache.json  |    14 +-
 .../arch/x86/nehalemep/floating-point.json    |     2 +-
 .../arch/x86/nehalemep/frontend.json          |     2 +-
 .../pmu-events/arch/x86/nehalemep/memory.json |     6 +-
 .../arch/x86/nehalemep/virtual-memory.json    |     2 +-
 .../pmu-events/arch/x86/nehalemex/cache.json  |  2974 +-
 .../arch/x86/nehalemex/floating-point.json    |   182 +-
 .../arch/x86/nehalemex/frontend.json          |    20 +-
 .../pmu-events/arch/x86/nehalemex/memory.json |   672 +-
 .../pmu-events/arch/x86/nehalemex/other.json  |   170 +-
 .../arch/x86/nehalemex/pipeline.json          |   830 +-
 .../arch/x86/nehalemex/virtual-memory.json    |    92 +-
 .../arch/x86/sandybridge/cache.json           |     2 +-
 .../arch/x86/sandybridge/floating-point.json  |     2 +-
 .../arch/x86/sandybridge/frontend.json        |     4 +-
 .../arch/x86/sandybridge/memory.json          |     2 +-
 .../arch/x86/sandybridge/other.json           |     2 +-
 .../arch/x86/sandybridge/pipeline.json        |    10 +-
 .../arch/x86/sandybridge/snb-metrics.json     |    11 +-
 .../arch/x86/sandybridge/uncore-other.json    |     2 +-
 .../arch/x86/sandybridge/virtual-memory.json  |     2 +-
 .../arch/x86/sapphirerapids/cache.json        |   135 +-
 .../x86/sapphirerapids/floating-point.json    |     6 +
 .../arch/x86/sapphirerapids/frontend.json     |    16 +
 .../arch/x86/sapphirerapids/memory.json       |    23 +-
 .../arch/x86/sapphirerapids/other.json        |    68 +-
 .../arch/x86/sapphirerapids/pipeline.json     |    99 +-
 .../arch/x86/sapphirerapids/spr-metrics.json  |   566 +-
 .../arch/x86/sapphirerapids/uncore-other.json |     9 -
 .../x86/sapphirerapids/virtual-memory.json    |    20 +
 .../pmu-events/arch/x86/silvermont/cache.json |     2 +-
 .../arch/x86/silvermont/floating-point.json   |     2 +-
 .../arch/x86/silvermont/frontend.json         |     2 +-
 .../arch/x86/silvermont/memory.json           |     2 +-
 .../pmu-events/arch/x86/silvermont/other.json |     2 +-
 .../arch/x86/silvermont/pipeline.json         |     2 +-
 .../arch/x86/silvermont/virtual-memory.json   |     2 +-
 .../arch/x86/skylake/floating-point.json      |     2 +-
 .../pmu-events/arch/x86/skylake/frontend.json |     2 +-
 .../pmu-events/arch/x86/skylake/other.json    |     2 +-
 .../arch/x86/skylake/skl-metrics.json         |   178 +-
 .../arch/x86/skylake/uncore-cache.json        |   142 +
 .../arch/x86/skylake/uncore-other.json        |    79 +
 .../pmu-events/arch/x86/skylake/uncore.json   |   254 -
 .../arch/x86/skylake/virtual-memory.json      |     2 +-
 .../arch/x86/skylakex/floating-point.json     |     2 +-
 .../arch/x86/skylakex/frontend.json           |     2 +-
 .../pmu-events/arch/x86/skylakex/other.json   |    66 +-
 .../arch/x86/skylakex/pipeline.json           |    11 +
 .../arch/x86/skylakex/skx-metrics.json        |   667 +-
 .../arch/x86/skylakex/uncore-memory.json      |     9 +
 .../arch/x86/skylakex/uncore-other.json       |   730 +-
 .../arch/x86/skylakex/virtual-memory.json     |     2 +-
 .../x86/{tremontx => snowridgex}/cache.json   |    60 +-
 .../floating-point.json                       |     9 +-
 .../{tremontx => snowridgex}/frontend.json    |    20 +-
 .../x86/{tremontx => snowridgex}/memory.json  |     4 +-
 .../x86/{tremontx => snowridgex}/other.json   |    18 +-
 .../{tremontx => snowridgex}/pipeline.json    |    98 +-
 .../arch/x86/snowridgex/uncore-memory.json    |   619 +
 .../arch/x86/snowridgex/uncore-other.json     | 25249 ++++++++++++++++
 .../arch/x86/snowridgex/uncore-power.json     |   235 +
 .../virtual-memory.json                       |    69 +-
 .../pmu-events/arch/x86/tigerlake/cache.json  |    48 +-
 .../arch/x86/tigerlake/floating-point.json    |     2 +-
 .../arch/x86/tigerlake/frontend.json          |     2 +-
 .../pmu-events/arch/x86/tigerlake/memory.json |     2 +-
 .../pmu-events/arch/x86/tigerlake/other.json  |     1 -
 .../arch/x86/tigerlake/pipeline.json          |     4 +-
 .../arch/x86/tigerlake/tgl-metrics.json       |   378 +-
 .../arch/x86/tigerlake/uncore-other.json      |    65 +
 .../arch/x86/tigerlake/virtual-memory.json    |     2 +-
 .../arch/x86/tremontx/uncore-memory.json      |   245 -
 .../arch/x86/tremontx/uncore-other.json       |  2395 --
 .../arch/x86/tremontx/uncore-power.json       |    11 -
 .../arch/x86/westmereep-dp/cache.json         |     2 +-
 .../x86/westmereep-dp/floating-point.json     |     2 +-
 .../arch/x86/westmereep-dp/frontend.json      |     2 +-
 .../arch/x86/westmereep-dp/memory.json        |     2 +-
 .../x86/westmereep-dp/virtual-memory.json     |     2 +-
 .../x86/westmereep-sp/floating-point.json     |     2 +-
 .../arch/x86/westmereep-sp/frontend.json      |     2 +-
 .../x86/westmereep-sp/virtual-memory.json     |     2 +-
 .../arch/x86/westmereex/floating-point.json   |     2 +-
 .../arch/x86/westmereex/frontend.json         |     2 +-
 .../arch/x86/westmereex/virtual-memory.json   |     2 +-
 tools/perf/tests/pmu-events.c                 |     9 +
 252 files changed, 89144 insertions(+), 8438 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-other.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore.json
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-other.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore.json
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/cache.json (95%)
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/floating-point.json (84%)
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/frontend.json (94%)
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/memory.json (99%)
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/other.json (98%)
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/pipeline.json (89%)
 create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
 rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/virtual-memory.json (91%)
 create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json

-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 01/31] perf test: Avoid sysfs state affecting fake events
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 06/31] perf vendor events: Update bonnell mapfile.csv Ian Rogers
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Icelake has a slots event, on my Skylakex I have CPU events in sysfs
of topdown-slots-issued and topdown-total-slots. Legacy event parsing
would try to use '-' to separate parts of an event and so
perf_pmu__parse_init sets 'slots' to be a PMU_EVENT_SYMBOL_SUFFIX2. As
such parsing the slots event for a fake PMU fails as a
PMU_EVENT_SYMBOL_SUFFIX2 isn't made into the PE_PMU_EVENT_FAKE token.
Resolve this issue by test initializing the PMU parsing state before
every parse. This must be done every parse as the state is removes after
each parse_events.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/pmu-events.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 478b33825790..263cbb67c861 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -812,6 +812,15 @@ static int check_parse_id(const char *id, struct parse_events_error *error,
 	for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@'))
 		*cur = '/';
 
+	if (fake_pmu) {
+		/*
+		 * Every call to __parse_events will try to initialize the PMU
+		 * state from sysfs and then clean it up at the end. Reset the
+		 * PMU events to the test state so that we don't pick up
+		 * erroneous prefixes and suffixes.
+		 */
+		perf_pmu__test_parse_init();
+	}
 	ret = __parse_events(evlist, dup, error, fake_pmu);
 	free(dup);
 
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 06/31] perf vendor events: Update bonnell mapfile.csv
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 01/31] perf test: Avoid sysfs state affecting fake events Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 09/31] perf vendor events: Update goldmont mapfile.csv Ian Rogers
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Align end of file whitespace with what is generated by:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

Fold the mapfile.csv entries together with a more complex regular
expression. This will reduce the pmu-events.c table size.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/bonnell/cache.json          | 2 +-
 tools/perf/pmu-events/arch/x86/bonnell/floating-point.json | 2 +-
 tools/perf/pmu-events/arch/x86/bonnell/frontend.json       | 2 +-
 tools/perf/pmu-events/arch/x86/bonnell/memory.json         | 2 +-
 tools/perf/pmu-events/arch/x86/bonnell/other.json          | 2 +-
 tools/perf/pmu-events/arch/x86/bonnell/pipeline.json       | 2 +-
 tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json | 2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv                 | 6 +-----
 8 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/bonnell/cache.json b/tools/perf/pmu-events/arch/x86/bonnell/cache.json
index 71653bfe7093..86582bb8aa39 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/cache.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/cache.json
@@ -743,4 +743,4 @@
         "SampleAfterValue": "10000",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/bonnell/floating-point.json b/tools/perf/pmu-events/arch/x86/bonnell/floating-point.json
index f8055ff47f19..1fa347d07c98 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/floating-point.json
@@ -258,4 +258,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/bonnell/frontend.json b/tools/perf/pmu-events/arch/x86/bonnell/frontend.json
index e852eb2cc878..21fe5fe229aa 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/frontend.json
@@ -88,4 +88,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/bonnell/memory.json b/tools/perf/pmu-events/arch/x86/bonnell/memory.json
index 2aa4c41f528e..f8b45b6fb4d3 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/memory.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/memory.json
@@ -151,4 +151,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x86"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/bonnell/other.json b/tools/perf/pmu-events/arch/x86/bonnell/other.json
index 114c062e7e96..e0bdcfbfa9dc 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/other.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/other.json
@@ -447,4 +447,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0xc0"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/bonnell/pipeline.json b/tools/perf/pmu-events/arch/x86/bonnell/pipeline.json
index 896b738e59b6..f5123c99a7ba 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/pipeline.json
@@ -353,4 +353,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x10"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json b/tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json
index c2363b8e61b4..e8512c585572 100644
--- a/tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/bonnell/virtual-memory.json
@@ -121,4 +121,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x3"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 1fe376b3624d..d3c0cc6eff8c 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -1,13 +1,9 @@
 Family-model,Version,Filename,EventType
 GenuineIntel-6-9[7A],v1.13,alderlake,core
+GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core
 GenuineIntel-6-(3D|47),v26,broadwell,core
 GenuineIntel-6-56,v23,broadwellde,core
 GenuineIntel-6-4F,v19,broadwellx,core
-GenuineIntel-6-1C,v4,bonnell,core
-GenuineIntel-6-26,v4,bonnell,core
-GenuineIntel-6-27,v4,bonnell,core
-GenuineIntel-6-36,v4,bonnell,core
-GenuineIntel-6-35,v4,bonnell,core
 GenuineIntel-6-5C,v8,goldmont,core
 GenuineIntel-6-7A,v1,goldmontplus,core
 GenuineIntel-6-3C,v24,haswell,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 09/31] perf vendor events: Update goldmont mapfile.csv
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 01/31] perf test: Avoid sysfs state affecting fake events Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 06/31] perf vendor events: Update bonnell mapfile.csv Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 10/31] perf vendor events: Update goldmontplus mapfile.csv Ian Rogers
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Align end of file whitespace with what is generated by:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

Modify mapfile.csv to have a missing goldmont cpuid.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/goldmont/cache.json          | 2 +-
 tools/perf/pmu-events/arch/x86/goldmont/floating-point.json | 2 +-
 tools/perf/pmu-events/arch/x86/goldmont/frontend.json       | 2 +-
 tools/perf/pmu-events/arch/x86/goldmont/memory.json         | 2 +-
 tools/perf/pmu-events/arch/x86/goldmont/pipeline.json       | 2 +-
 tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json | 2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv                  | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/goldmont/cache.json b/tools/perf/pmu-events/arch/x86/goldmont/cache.json
index 0b887d73b7f3..ed957d4f9c6d 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/cache.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/cache.json
@@ -1300,4 +1300,4 @@
         "SampleAfterValue": "100007",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmont/floating-point.json b/tools/perf/pmu-events/arch/x86/goldmont/floating-point.json
index bb364a04a75f..37174392a510 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/floating-point.json
@@ -30,4 +30,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmont/frontend.json b/tools/perf/pmu-events/arch/x86/goldmont/frontend.json
index 120ff65897c0..216da6e121c8 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/frontend.json
@@ -79,4 +79,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmont/memory.json b/tools/perf/pmu-events/arch/x86/goldmont/memory.json
index 6252503f68a1..9f6f0328249e 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/memory.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/memory.json
@@ -31,4 +31,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x4"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
index 5dba4313013f..42ff0b134aeb 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json
@@ -354,7 +354,7 @@
         "Counter": "0,1,2,3",
         "EventCode": "0xC3",
         "EventName": "MACHINE_CLEARS.SMC",
-        "PublicDescription": "Counts the number of times that the processor detects that a program is writing to a code section and has to perform a machine clear because of that modification.  Self-modifying code (SMC) causes a severe penalty in all Intel architecture processors.",
+        "PublicDescription": "Counts the number of times that the processor detects that a program is writing to a code section and has to perform a machine clear because of that modification.  Self-modifying code (SMC) causes a severe penalty in all Intel(R) architecture processors.",
         "SampleAfterValue": "200003",
         "UMask": "0x1"
     },
diff --git a/tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json b/tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json
index d5e89c74a9be..2e17e02e1463 100644
--- a/tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/goldmont/virtual-memory.json
@@ -75,4 +75,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index b105d80d2b7d..ef0beab68a90 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -6,7 +6,7 @@ GenuineIntel-6-56,v23,broadwellde,core
 GenuineIntel-6-4F,v19,broadwellx,core
 GenuineIntel-6-55-[56789ABCDEF],v1.16,cascadelakex,core
 GenuineIntel-6-96,v1.03,elkhartlake,core
-GenuineIntel-6-5C,v8,goldmont,core
+GenuineIntel-6-5[CF],v13,goldmont,core
 GenuineIntel-6-7A,v1,goldmontplus,core
 GenuineIntel-6-3C,v24,haswell,core
 GenuineIntel-6-45,v24,haswell,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 10/31] perf vendor events: Update goldmontplus mapfile.csv
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (2 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 09/31] perf vendor events: Update goldmont mapfile.csv Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 11/31] perf vendor events: Update Intel haswell Ian Rogers
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Align end of file whitespace with what is generated by:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

Correct the version in mapfile.csv.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/goldmontplus/cache.json          | 2 +-
 tools/perf/pmu-events/arch/x86/goldmontplus/floating-point.json | 2 +-
 tools/perf/pmu-events/arch/x86/goldmontplus/frontend.json       | 2 +-
 tools/perf/pmu-events/arch/x86/goldmontplus/memory.json         | 2 +-
 tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json       | 2 +-
 tools/perf/pmu-events/arch/x86/goldmontplus/virtual-memory.json | 2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv                      | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/cache.json b/tools/perf/pmu-events/arch/x86/goldmontplus/cache.json
index 59c039169eb8..16e8913c0434 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/cache.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/cache.json
@@ -1462,4 +1462,4 @@
         "SampleAfterValue": "100007",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/floating-point.json b/tools/perf/pmu-events/arch/x86/goldmontplus/floating-point.json
index c1f00c9470f4..9c3d22439530 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/floating-point.json
@@ -35,4 +35,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/frontend.json b/tools/perf/pmu-events/arch/x86/goldmontplus/frontend.json
index 3fdc788a2b20..4c2abfbac8f8 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/frontend.json
@@ -95,4 +95,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/memory.json b/tools/perf/pmu-events/arch/x86/goldmontplus/memory.json
index e26763d16d52..ae0cb3451866 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/memory.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/memory.json
@@ -35,4 +35,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x4"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
index 4d7e3129e5ac..2b712b12cc1f 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json
@@ -428,7 +428,7 @@
         "EventName": "MACHINE_CLEARS.SMC",
         "PDIR_COUNTER": "na",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Counts the number of times that the processor detects that a program is writing to a code section and has to perform a machine clear because of that modification.  Self-modifying code (SMC) causes a severe penalty in all Intel architecture processors.",
+        "PublicDescription": "Counts the number of times that the processor detects that a program is writing to a code section and has to perform a machine clear because of that modification.  Self-modifying code (SMC) causes a severe penalty in all Intel(R) architecture processors.",
         "SampleAfterValue": "20003",
         "UMask": "0x1"
     },
diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/virtual-memory.json b/tools/perf/pmu-events/arch/x86/goldmontplus/virtual-memory.json
index 36eaec87eead..1f7db22c15e6 100644
--- a/tools/perf/pmu-events/arch/x86/goldmontplus/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/goldmontplus/virtual-memory.json
@@ -218,4 +218,4 @@
         "SampleAfterValue": "20003",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index ef0beab68a90..a2991906b51c 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -7,7 +7,7 @@ GenuineIntel-6-4F,v19,broadwellx,core
 GenuineIntel-6-55-[56789ABCDEF],v1.16,cascadelakex,core
 GenuineIntel-6-96,v1.03,elkhartlake,core
 GenuineIntel-6-5[CF],v13,goldmont,core
-GenuineIntel-6-7A,v1,goldmontplus,core
+GenuineIntel-6-7A,v1.01,goldmontplus,core
 GenuineIntel-6-3C,v24,haswell,core
 GenuineIntel-6-45,v24,haswell,core
 GenuineIntel-6-46,v24,haswell,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 11/31] perf vendor events: Update Intel haswell
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (3 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 10/31] perf vendor events: Update goldmontplus mapfile.csv Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 13/31] perf vendor events: Update Intel icelake Ian Rogers
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the haswell files into perf and update mapfile.csv.

Tested on a non-haswell with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../pmu-events/arch/x86/haswell/cache.json    | 78 +++++++----------
 .../arch/x86/haswell/floating-point.json      |  2 +-
 .../pmu-events/arch/x86/haswell/frontend.json |  2 +-
 .../arch/x86/haswell/hsw-metrics.json         | 85 +++++++++++++------
 .../pmu-events/arch/x86/haswell/memory.json   | 75 ++++++----------
 .../pmu-events/arch/x86/haswell/other.json    |  2 +-
 .../pmu-events/arch/x86/haswell/pipeline.json |  9 +-
 .../arch/x86/haswell/uncore-other.json        |  7 +-
 .../arch/x86/haswell/virtual-memory.json      |  2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |  4 +-
 10 files changed, 125 insertions(+), 141 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/haswell/cache.json b/tools/perf/pmu-events/arch/x86/haswell/cache.json
index 91464cfb9615..3b0f3a264246 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/cache.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/cache.json
@@ -556,7 +556,7 @@
         "UMask": "0x20"
     },
     {
-        "BriefDescription": "All retired load uops.",
+        "BriefDescription": "Retired load uops.",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "Data_LA": "1",
@@ -564,11 +564,12 @@
         "EventCode": "0xD0",
         "EventName": "MEM_UOPS_RETIRED.ALL_LOADS",
         "PEBS": "1",
+        "PublicDescription": "Counts all retired load uops. This event accounts for SW prefetch uops of PREFETCHNTA or PREFETCHT0/1/2 or PREFETCHW.",
         "SampleAfterValue": "2000003",
         "UMask": "0x81"
     },
     {
-        "BriefDescription": "All retired store uops.",
+        "BriefDescription": "Retired store uops.",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "Data_LA": "1",
@@ -577,6 +578,7 @@
         "EventName": "MEM_UOPS_RETIRED.ALL_STORES",
         "L1_Hit_Indication": "1",
         "PEBS": "1",
+        "PublicDescription": "Counts all retired store uops.",
         "SampleAfterValue": "2000003",
         "UMask": "0x82"
     },
@@ -790,20 +792,19 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch code reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
+        "BriefDescription": "Counts all demand & prefetch code readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_CODE_RD.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C0244",
+        "MSRValue": "0x4003C0244",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch code reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch data reads hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
+        "BriefDescription": "Counts all demand & prefetch data readshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -811,20 +812,18 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x10003C0091",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch data reads hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch data reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
+        "BriefDescription": "Counts all demand & prefetch data readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C0091",
+        "MSRValue": "0x4003C0091",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch data reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -837,7 +836,6 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x10003C07F7",
         "Offcore": "1",
-        "PublicDescription": "hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -848,14 +846,13 @@
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_READS.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C07F7",
+        "MSRValue": "0x4003C07F7",
         "Offcore": "1",
-        "PublicDescription": "hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all requests hit in the L3",
+        "BriefDescription": "Counts all requestshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -863,12 +860,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C8FFF",
         "Offcore": "1",
-        "PublicDescription": "Counts all requests hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch RFOs hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
+        "BriefDescription": "Counts all demand & prefetch RFOshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -876,25 +872,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x10003C0122",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch RFOs hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch RFOs hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
+        "BriefDescription": "Counts all demand & prefetch RFOshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C0122",
+        "MSRValue": "0x4003C0122",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch RFOs hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand code reads hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
+        "BriefDescription": "Counts all demand code readshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -902,25 +896,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x10003C0004",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand code reads hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand code reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
+        "BriefDescription": "Counts all demand code readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C0004",
+        "MSRValue": "0x4003C0004",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand code reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand data reads hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
+        "BriefDescription": "Counts demand data readshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -928,25 +920,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x10003C0001",
         "Offcore": "1",
-        "PublicDescription": "Counts demand data reads hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand data reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
+        "BriefDescription": "Counts demand data readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C0001",
+        "MSRValue": "0x4003C0001",
         "Offcore": "1",
-        "PublicDescription": "Counts demand data reads hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand data writes (RFOs) hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
+        "BriefDescription": "Counts all demand data writes (RFOs)hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -954,25 +944,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x10003C0002",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand data writes (RFOs) hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand data writes (RFOs) hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
+        "BriefDescription": "Counts all demand data writes (RFOs)hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_HIT.HIT_OTHER_CORE_NO_FWD",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x04003C0002",
+        "MSRValue": "0x4003C0002",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand data writes (RFOs) hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to LLC only) code reads hit in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to LLC only) code readshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -980,12 +968,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C0040",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to LLC only) code reads hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts prefetch (that bring data to L2) data reads hit in the L3",
+        "BriefDescription": "Counts prefetch (that bring data to L2) data readshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -993,12 +980,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C0010",
         "Offcore": "1",
-        "PublicDescription": "Counts prefetch (that bring data to L2) data reads hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to L2) RFOs hit in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to L2) RFOshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -1006,12 +992,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C0020",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to L2) RFOs hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts prefetch (that bring data to LLC only) code reads hit in the L3",
+        "BriefDescription": "Counts prefetch (that bring data to LLC only) code readshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -1019,12 +1004,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C0200",
         "Offcore": "1",
-        "PublicDescription": "Counts prefetch (that bring data to LLC only) code reads hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to LLC only) data reads hit in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to LLC only) data readshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -1032,12 +1016,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C0080",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to LLC only) data reads hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to LLC only) RFOs hit in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to LLC only) RFOshit in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -1045,7 +1028,6 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3F803C0100",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to LLC only) RFOs hit in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -1058,4 +1040,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x10"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/floating-point.json b/tools/perf/pmu-events/arch/x86/haswell/floating-point.json
index 55cf5b96464e..7cf203a90a74 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/floating-point.json
@@ -100,4 +100,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x10"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/frontend.json b/tools/perf/pmu-events/arch/x86/haswell/frontend.json
index 0c8d5ccf1276..c45a09abe5d3 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/frontend.json
@@ -301,4 +301,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
index 3ade2c19533e..75dc6dd9a7bc 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
@@ -111,17 +111,11 @@
         "MetricName": "CoreIPC_SMT"
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else UOPS_EXECUTED.CORE / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "ILP"
     },
-    {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BadSpec;BrMispredicts",
-        "MetricName": "IpMispredict"
-    },
     {
         "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
         "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
@@ -170,6 +164,12 @@
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
+        "MetricGroup": "Pipeline;Ret",
+        "MetricName": "Retire"
+    },
     {
         "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
         "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
@@ -177,11 +177,16 @@
         "MetricName": "DSB_Coverage"
     },
     {
-        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles)",
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BadSpec;BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
+    {
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
         "MetricGroup": "Mem;MemoryBound;MemoryLat",
-        "MetricName": "Load_Miss_Real_Latency",
-        "PublicDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles). Latency may be overestimated for multi-load instructions - e.g. repeat strings."
+        "MetricName": "Load_Miss_Real_Latency"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
@@ -189,24 +194,6 @@
         "MetricGroup": "Mem;MemoryBound;MemoryBW",
         "MetricName": "MLP"
     },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
-        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L1D_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
-        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L2_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L3_Cache_Fill_BW"
-    },
     {
         "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
         "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY",
@@ -238,6 +225,48 @@
         "MetricGroup": "Mem;MemoryTLB_SMT",
         "MetricName": "Page_Walks_Utilization_SMT"
     },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "0",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW_1T"
+    },
     {
         "BriefDescription": "Average CPU Utilization",
         "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
diff --git a/tools/perf/pmu-events/arch/x86/haswell/memory.json b/tools/perf/pmu-events/arch/x86/haswell/memory.json
index 8b69493e3726..9e5a1e0966d9 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/memory.json
@@ -225,7 +225,7 @@
         "UMask": "0x2"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch code reads miss in the L3",
+        "BriefDescription": "Counts all demand & prefetch code readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -233,25 +233,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00244",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch code reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch code reads miss the L3 and the data is returned from local dram",
+        "BriefDescription": "Counts all demand & prefetch code readsmiss the L3 and the data is returned from local dram",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_CODE_RD.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x0100400244",
+        "MSRValue": "0x100400244",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch code reads miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch data reads miss in the L3",
+        "BriefDescription": "Counts all demand & prefetch data readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -259,20 +257,18 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00091",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch data reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch data reads miss the L3 and the data is returned from local dram",
+        "BriefDescription": "Counts all demand & prefetch data readsmiss the L3 and the data is returned from local dram",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x0100400091",
+        "MSRValue": "0x100400091",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch data reads miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -285,7 +281,6 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC007F7",
         "Offcore": "1",
-        "PublicDescription": "miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -296,14 +291,13 @@
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_READS.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x01004007F7",
+        "MSRValue": "0x1004007F7",
         "Offcore": "1",
-        "PublicDescription": "miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all requests miss in the L3",
+        "BriefDescription": "Counts all requestsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -311,12 +305,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC08FFF",
         "Offcore": "1",
-        "PublicDescription": "Counts all requests miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch RFOs miss in the L3",
+        "BriefDescription": "Counts all demand & prefetch RFOsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -324,25 +317,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00122",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch RFOs miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand & prefetch RFOs miss the L3 and the data is returned from local dram",
+        "BriefDescription": "Counts all demand & prefetch RFOsmiss the L3 and the data is returned from local dram",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x0100400122",
+        "MSRValue": "0x100400122",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand & prefetch RFOs miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand code reads miss in the L3",
+        "BriefDescription": "Counts all demand code readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -350,25 +341,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00004",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand code reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand code reads miss the L3 and the data is returned from local dram",
+        "BriefDescription": "Counts all demand code readsmiss the L3 and the data is returned from local dram",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x0100400004",
+        "MSRValue": "0x100400004",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand code reads miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand data reads miss in the L3",
+        "BriefDescription": "Counts demand data readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -376,25 +365,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00001",
         "Offcore": "1",
-        "PublicDescription": "Counts demand data reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts demand data reads miss the L3 and the data is returned from local dram",
+        "BriefDescription": "Counts demand data readsmiss the L3 and the data is returned from local dram",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x0100400001",
+        "MSRValue": "0x100400001",
         "Offcore": "1",
-        "PublicDescription": "Counts demand data reads miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand data writes (RFOs) miss in the L3",
+        "BriefDescription": "Counts all demand data writes (RFOs)miss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -402,25 +389,23 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00002",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand data writes (RFOs) miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all demand data writes (RFOs) miss the L3 and the data is returned from local dram",
+        "BriefDescription": "Counts all demand data writes (RFOs)miss the L3 and the data is returned from local dram",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_MISS.LOCAL_DRAM",
         "MSRIndex": "0x1a6,0x1a7",
-        "MSRValue": "0x0100400002",
+        "MSRValue": "0x100400002",
         "Offcore": "1",
-        "PublicDescription": "Counts all demand data writes (RFOs) miss the L3 and the data is returned from local dram",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to LLC only) code reads miss in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to LLC only) code readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -428,12 +413,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00040",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to LLC only) code reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts prefetch (that bring data to L2) data reads miss in the L3",
+        "BriefDescription": "Counts prefetch (that bring data to L2) data readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -441,12 +425,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00010",
         "Offcore": "1",
-        "PublicDescription": "Counts prefetch (that bring data to L2) data reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to L2) RFOs miss in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to L2) RFOsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -454,12 +437,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00020",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to L2) RFOs miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts prefetch (that bring data to LLC only) code reads miss in the L3",
+        "BriefDescription": "Counts prefetch (that bring data to LLC only) code readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -467,12 +449,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00200",
         "Offcore": "1",
-        "PublicDescription": "Counts prefetch (that bring data to LLC only) code reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to LLC only) data reads miss in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to LLC only) data readsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -480,12 +461,11 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00080",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to LLC only) data reads miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all prefetch (that bring data to LLC only) RFOs miss in the L3",
+        "BriefDescription": "Counts all prefetch (that bring data to LLC only) RFOsmiss in the L3",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
@@ -493,7 +473,6 @@
         "MSRIndex": "0x1a6,0x1a7",
         "MSRValue": "0x3FFFC00100",
         "Offcore": "1",
-        "PublicDescription": "Counts all prefetch (that bring data to LLC only) RFOs miss in the L3",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -681,4 +660,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x40"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/other.json b/tools/perf/pmu-events/arch/x86/haswell/other.json
index 4c6b9d34325a..7ca34f09b185 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/other.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/other.json
@@ -40,4 +40,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
index a53f28ec9270..42f6a8100661 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json
@@ -1035,7 +1035,6 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0xA1",
         "EventName": "UOPS_EXECUTED_PORT.PORT_0_CORE",
-        "PublicDescription": "Cycles per core when uops are exectuted in port 0.",
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     },
@@ -1056,7 +1055,6 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0xA1",
         "EventName": "UOPS_EXECUTED_PORT.PORT_1_CORE",
-        "PublicDescription": "Cycles per core when uops are exectuted in port 1.",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
     },
@@ -1117,7 +1115,6 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0xA1",
         "EventName": "UOPS_EXECUTED_PORT.PORT_4_CORE",
-        "PublicDescription": "Cycles per core when uops are exectuted in port 4.",
         "SampleAfterValue": "2000003",
         "UMask": "0x10"
     },
@@ -1138,7 +1135,6 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0xA1",
         "EventName": "UOPS_EXECUTED_PORT.PORT_5_CORE",
-        "PublicDescription": "Cycles per core when uops are exectuted in port 5.",
         "SampleAfterValue": "2000003",
         "UMask": "0x20"
     },
@@ -1159,7 +1155,6 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0xA1",
         "EventName": "UOPS_EXECUTED_PORT.PORT_6_CORE",
-        "PublicDescription": "Cycles per core when uops are exectuted in port 6.",
         "SampleAfterValue": "2000003",
         "UMask": "0x40"
     },
@@ -1295,11 +1290,11 @@
         "BriefDescription": "Cycles with less than 10 actually retired uops.",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3",
-        "CounterMask": "10",
+        "CounterMask": "16",
         "EventCode": "0xC2",
         "EventName": "UOPS_RETIRED.TOTAL_CYCLES",
         "Invert": "1",
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/uncore-other.json b/tools/perf/pmu-events/arch/x86/haswell/uncore-other.json
index 8f2ae2891042..56c4b380dc95 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/uncore-other.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/uncore-other.json
@@ -19,11 +19,11 @@
         "Unit": "ARB"
     },
     {
-        "BriefDescription": "Each cycle count number of all Core outgoing valid entries. Such entry is defined as valid from it's allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.",
+        "BriefDescription": "Each cycle counts number of all Core outgoing valid entries. Such entry is defined as valid from its allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.",
         "EventCode": "0x80",
         "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL",
         "PerPkg": "1",
-        "PublicDescription": "Each cycle count number of all Core outgoing valid entries. Such entry is defined as valid from it's allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.",
+        "PublicDescription": "Each cycle counts number of all Core outgoing valid entries. Such entry is defined as valid from its allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.",
         "UMask": "0x01",
         "Unit": "ARB"
     },
@@ -34,6 +34,7 @@
         "EventCode": "0x80",
         "EventName": "UNC_ARB_TRK_OCCUPANCY.CYCLES_WITH_ANY_REQUEST",
         "PerPkg": "1",
+        "PublicDescription": "Cycles with at least one request outstanding is waiting for data return from memory controller. Account for coherent and non-coherent requests initiated by IA Cores, Processor Graphics Unit, or LLC.\n",
         "UMask": "0x01",
         "Unit": "ARB"
     },
@@ -64,6 +65,6 @@
         "EventName": "UNC_CLOCK.SOCKET",
         "PerPkg": "1",
         "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
-        "Unit": "NCU"
+        "Unit": "CLOCK"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/haswell/virtual-memory.json b/tools/perf/pmu-events/arch/x86/haswell/virtual-memory.json
index ba3e77a9f9a0..57d2a6452fec 100644
--- a/tools/perf/pmu-events/arch/x86/haswell/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/haswell/virtual-memory.json
@@ -481,4 +481,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index a2991906b51c..66d54c879a3e 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -8,9 +8,7 @@ GenuineIntel-6-55-[56789ABCDEF],v1.16,cascadelakex,core
 GenuineIntel-6-96,v1.03,elkhartlake,core
 GenuineIntel-6-5[CF],v13,goldmont,core
 GenuineIntel-6-7A,v1.01,goldmontplus,core
-GenuineIntel-6-3C,v24,haswell,core
-GenuineIntel-6-45,v24,haswell,core
-GenuineIntel-6-46,v24,haswell,core
+GenuineIntel-6-(3C|45|46),v31,haswell,core
 GenuineIntel-6-3F,v17,haswellx,core
 GenuineIntel-6-3A,v18,ivybridge,core
 GenuineIntel-6-3E,v19,ivytown,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 13/31] perf vendor events: Update Intel icelake
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (4 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 11/31] perf vendor events: Update Intel haswell Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 14/31] perf vendor events: Update Intel icelakex Ian Rogers
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the icelake files into perf and update mapfile.csv.

Tested on a non-icelake with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../pmu-events/arch/x86/icelake/cache.json    |   8 +-
 .../arch/x86/icelake/floating-point.json      |   2 +-
 .../pmu-events/arch/x86/icelake/frontend.json |   2 +-
 .../arch/x86/icelake/icl-metrics.json         | 126 ++++++++++++------
 .../arch/x86/icelake/uncore-other.json        |  31 +++++
 .../arch/x86/icelake/virtual-memory.json      |   2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   4 +-
 7 files changed, 123 insertions(+), 52 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-other.json

diff --git a/tools/perf/pmu-events/arch/x86/icelake/cache.json b/tools/perf/pmu-events/arch/x86/icelake/cache.json
index 9989f3338f0a..b4f28f24ee63 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/cache.json
@@ -303,7 +303,7 @@
         "UMask": "0x41"
     },
     {
-        "BriefDescription": "All retired load instructions.",
+        "BriefDescription": "Retired load instructions.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
@@ -311,12 +311,12 @@
         "EventName": "MEM_INST_RETIRED.ALL_LOADS",
         "PEBS": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions for loads.",
+        "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2 or PREFETCHW.",
         "SampleAfterValue": "1000003",
         "UMask": "0x81"
     },
     {
-        "BriefDescription": "All retired store instructions.",
+        "BriefDescription": "Retired store instructions.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
@@ -325,7 +325,7 @@
         "L1_Hit_Indication": "1",
         "PEBS": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Counts all retired store instructions. This event account for SW prefetch instructions and PREFETCHW instruction for stores.",
+        "PublicDescription": "Counts all retired store instructions.",
         "SampleAfterValue": "1000003",
         "UMask": "0x82"
     },
diff --git a/tools/perf/pmu-events/arch/x86/icelake/floating-point.json b/tools/perf/pmu-events/arch/x86/icelake/floating-point.json
index 4347e2d0d090..1925388969bb 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/floating-point.json
@@ -99,4 +99,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/icelake/frontend.json b/tools/perf/pmu-events/arch/x86/icelake/frontend.json
index b510dd5d80da..739361d3f52f 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/frontend.json
@@ -494,4 +494,4 @@
         "Speculative": "1",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
index 622c392f59be..f0356d66a927 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json
@@ -38,7 +38,7 @@
     {
         "BriefDescription": "Fraction of Physical Core issue-slots utilized by this Logical Processor",
         "MetricExpr": "TOPDOWN.SLOTS / ( TOPDOWN.SLOTS / 2 ) if #SMT_on else 1",
-        "MetricGroup": "SMT",
+        "MetricGroup": "SMT;TmaL1",
         "MetricName": "Slots_Utilization"
     },
     {
@@ -61,24 +61,18 @@
         "MetricName": "FLOPc"
     },
     {
-        "BriefDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width)",
+        "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)",
         "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE) ) / ( 2 * CPU_CLK_UNHALTED.DISTRIBUTED )",
         "MetricGroup": "Cor;Flops;HPC",
         "MetricName": "FP_Arith_Utilization",
-        "PublicDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width). Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
+        "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)."
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "ILP"
     },
-    {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BadSpec;BrMispredicts",
-        "MetricName": "IpMispredict"
-    },
     {
         "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
         "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED",
@@ -169,12 +163,24 @@
         "MetricName": "IpArith_AVX512",
         "PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
     },
+    {
+        "BriefDescription": "Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / cpu@SW_PREFETCH_ACCESS.T0\\,umask\\=0xF@",
+        "MetricGroup": "Prefetches",
+        "MetricName": "IpSWPF"
+    },
     {
         "BriefDescription": "Total number of retired Instructions, Sample with: INST_RETIRED.PREC_DIST",
         "MetricExpr": "INST_RETIRED.ANY",
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
+        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
+        "MetricName": "Execute"
+    },
     {
         "BriefDescription": "Average number of Uops issued by front-end when it issued something",
         "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=1@",
@@ -194,11 +200,23 @@
         "MetricName": "DSB_Coverage"
     },
     {
-        "BriefDescription": "Number of Instructions per non-speculative DSB miss",
+        "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.",
+        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu@DSB2MITE_SWITCHES.PENALTY_CYCLES\\,cmask\\=1\\,edge@",
+        "MetricGroup": "DSBmiss",
+        "MetricName": "DSB_Switch_Cost"
+    },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative DSB miss (lower number means higher occurrence rate)",
         "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS",
         "MetricGroup": "DSBmiss;Fed",
         "MetricName": "IpDSB_Miss_Ret"
     },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BadSpec;BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
     {
         "BriefDescription": "Fraction of branches that are non-taken conditionals",
         "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_BRANCHES",
@@ -230,11 +248,10 @@
         "MetricName": "Other_Branches"
     },
     {
-        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles)",
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
         "MetricGroup": "Mem;MemoryBound;MemoryLat",
-        "MetricName": "Load_Miss_Real_Latency",
-        "PublicDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles). Latency may be overestimated for multi-load instructions - e.g. repeat strings."
+        "MetricName": "Load_Miss_Real_Latency"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
@@ -242,30 +259,6 @@
         "MetricGroup": "Mem;MemoryBound;MemoryBW",
         "MetricName": "MLP"
     },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
-        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L1D_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
-        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L2_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L3_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW;Offcore",
-        "MetricName": "L3_Cache_Access_BW"
-    },
     {
         "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
         "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
@@ -285,13 +278,13 @@
         "MetricName": "L2MPKI"
     },
     {
-        "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)",
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)",
         "MetricExpr": "1000 * ( ( OFFCORE_REQUESTS.ALL_DATA_RD - OFFCORE_REQUESTS.DEMAND_DATA_RD ) + L2_RQSTS.ALL_DEMAND_MISS + L2_RQSTS.SWPF_MISS ) / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses;Offcore",
         "MetricName": "L2MPKI_All"
     },
     {
-        "BriefDescription": "L2 cache misses per kilo instruction for all demand loads  (including speculative)",
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all demand loads  (including speculative)",
         "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses",
         "MetricName": "L2MPKI_Load"
@@ -309,7 +302,7 @@
         "MetricName": "L3MPKI"
     },
     {
-        "BriefDescription": "Fill Buffer (FB) true hits per kilo instructions for retired demand loads",
+        "BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
         "MetricExpr": "1000 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses",
         "MetricName": "FB_HPKI"
@@ -321,6 +314,54 @@
         "MetricGroup": "Mem;MemoryTLB",
         "MetricName": "Page_Walks_Utilization"
     },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW_1T"
+    },
     {
         "BriefDescription": "Average CPU Utilization",
         "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
@@ -337,7 +378,8 @@
         "BriefDescription": "Giga Floating Point Operations Per Second",
         "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / 1000000000 ) / duration_time",
         "MetricGroup": "Cor;Flops;HPC",
-        "MetricName": "GFLOPs"
+        "MetricName": "GFLOPs",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
diff --git a/tools/perf/pmu-events/arch/x86/icelake/uncore-other.json b/tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
new file mode 100644
index 000000000000..e007b976547d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
@@ -0,0 +1,31 @@
+[
+    {
+        "BriefDescription": "Number of entries allocated. Account for Any type: e.g. Snoop,  etc.",
+        "Counter": "1",
+        "EventCode": "0x84",
+        "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL",
+        "PerPkg": "1",
+        "PublicDescription": "Number of entries allocated. Account for Any type: e.g. Snoop,  etc.",
+        "UMask": "0x01",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Total number of all outgoing entries allocated. Accounts for Coherent and non-coherent traffic.",
+        "Counter": "1",
+        "EventCode": "0x81",
+        "EventName": "UNC_ARB_TRK_REQUESTS.ALL",
+        "PerPkg": "1",
+        "PublicDescription": "Total number of all outgoing entries allocated. Accounts for Coherent and non-coherent traffic.",
+        "UMask": "0x01",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "UNC_CLOCK.SOCKET",
+        "Counter": "FIXED",
+        "EventCode": "0xff",
+        "EventName": "UNC_CLOCK.SOCKET",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_CLOCK.SOCKET",
+        "Unit": "CLOCK"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json b/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json
index a006fd7f7b18..58809e16bf98 100644
--- a/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json
@@ -242,4 +242,4 @@
         "Speculative": "1",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 41c0c694211a..aae90ece2013 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -10,6 +10,7 @@ GenuineIntel-6-5[CF],v13,goldmont,core
 GenuineIntel-6-7A,v1.01,goldmontplus,core
 GenuineIntel-6-(3C|45|46),v31,haswell,core
 GenuineIntel-6-3F,v25,haswellx,core
+GenuineIntel-6-(7D|7E|A7),v1.14,icelake,core
 GenuineIntel-6-3A,v18,ivybridge,core
 GenuineIntel-6-3E,v19,ivytown,core
 GenuineIntel-6-2D,v20,jaketown,core
@@ -29,10 +30,7 @@ GenuineIntel-6-2C,v2,westmereep-dp,core
 GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
 GenuineIntel-6-55-[01234],v1,skylakex,core
-GenuineIntel-6-7D,v1,icelake,core
-GenuineIntel-6-7E,v1,icelake,core
 GenuineIntel-6-8[CD],v1,tigerlake,core
-GenuineIntel-6-A7,v1,icelake,core
 GenuineIntel-6-6A,v1,icelakex,core
 GenuineIntel-6-6C,v1,icelakex,core
 GenuineIntel-6-86,v1,tremontx,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 14/31] perf vendor events: Update Intel icelakex
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (5 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 13/31] perf vendor events: Update Intel icelake Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 15/31] perf vendor events: Update Intel ivybridge Ian Rogers
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the icelakex files into perf and update mapfile.csv.

Tested with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 90: perf all metricgroups test                                      : Ok
 91: perf all metrics test                                           : Skip
 93: perf all PMU test                                               : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../pmu-events/arch/x86/icelakex/cache.json   |  28 +-
 .../arch/x86/icelakex/floating-point.json     |   2 +-
 .../arch/x86/icelakex/frontend.json           |   2 +-
 .../arch/x86/icelakex/icx-metrics.json        | 691 ++++++++++++++++--
 .../pmu-events/arch/x86/icelakex/memory.json  |   6 +-
 .../pmu-events/arch/x86/icelakex/other.json   |  51 +-
 .../arch/x86/icelakex/pipeline.json           |  12 +
 .../arch/x86/icelakex/virtual-memory.json     |   2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   3 +-
 9 files changed, 689 insertions(+), 108 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/icelakex/cache.json b/tools/perf/pmu-events/arch/x86/icelakex/cache.json
index 95fcbec188f8..775190bdd063 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/cache.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/cache.json
@@ -291,7 +291,7 @@
         "UMask": "0x4f"
     },
     {
-        "BriefDescription": "All retired load instructions.",
+        "BriefDescription": "Retired load instructions.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
@@ -299,12 +299,12 @@
         "EventName": "MEM_INST_RETIRED.ALL_LOADS",
         "PEBS": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions for loads.",
+        "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2 or PREFETCHW.",
         "SampleAfterValue": "1000003",
         "UMask": "0x81"
     },
     {
-        "BriefDescription": "All retired store instructions.",
+        "BriefDescription": "Retired store instructions.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
@@ -313,7 +313,7 @@
         "L1_Hit_Indication": "1",
         "PEBS": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Counts all retired store instructions. This event account for SW prefetch instructions and PREFETCHW instruction for stores.",
+        "PublicDescription": "Counts all retired store instructions.",
         "SampleAfterValue": "1000003",
         "UMask": "0x82"
     },
@@ -409,7 +409,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts retired load instructions whose data sources were HitM responses from shared L3.",
         "SampleAfterValue": "20011",
-        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -473,7 +472,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts retired load instructions whose data sources were L3 and cross-core snoop hits in on-pkg core cache.",
         "SampleAfterValue": "20011",
-        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -867,7 +865,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that hit in the L3 or were snooped from another core's caches on the same socket.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that hit in the L3 or were snooped from another core's caches on the same socket.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_HIT",
@@ -878,7 +876,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that resulted in a snoop hit a modified line in another core's caches which forwarded the data.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that resulted in a snoop hit a modified line in another core's caches which forwarded the data.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_HIT.SNOOP_HITM",
@@ -889,7 +887,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that resulted in a snoop that hit in another core, which did not forward the data.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that resulted in a snoop that hit in another core, which did not forward the data.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_HIT.SNOOP_HIT_NO_FWD",
@@ -900,7 +898,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that resulted in a snoop hit in another core's caches which forwarded the unmodified data to the requesting core.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that resulted in a snoop hit in another core's caches which forwarded the unmodified data to the requesting core.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_HIT.SNOOP_HIT_WITH_FWD",
@@ -911,7 +909,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by a cache on a remote socket where a snoop was sent and data was returned (Modified or Not Modified).",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by a cache on a remote socket where a snoop was sent and data was returned (Modified or Not Modified).",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.REMOTE_CACHE.SNOOP_FWD",
@@ -922,7 +920,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by a cache on a remote socket where a snoop hit a modified line in another core's caches which forwarded the data.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by a cache on a remote socket where a snoop hit a modified line in another core's caches which forwarded the data.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.REMOTE_CACHE.SNOOP_HITM",
@@ -933,7 +931,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by a cache on a remote socket where a snoop hit in another core's caches which forwarded the unmodified data to the requesting core.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by a cache on a remote socket where a snoop hit in another core's caches which forwarded the unmodified data to the requesting core.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.REMOTE_CACHE.SNOOP_HIT_WITH_FWD",
@@ -944,7 +942,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that hit a modified line in a distant L3 Cache or were snooped from a distant core's L1/L2 caches on this socket when the system is in SNC (sub-NUMA cluster) mode.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that hit a modified line in a distant L3 Cache or were snooped from a distant core's L1/L2 caches on this socket when the system is in SNC (sub-NUMA cluster) mode.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.SNC_CACHE.HITM",
@@ -955,7 +953,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that either hit a non-modified line in a distant L3 Cache or were snooped from a distant core's L1/L2 caches on this socket when the system is in SNC (sub-NUMA cluster) mode.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that either hit a non-modified line in a distant L3 Cache or were snooped from a distant core's L1/L2 caches on this socket when the system is in SNC (sub-NUMA cluster) mode.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.SNC_CACHE.HIT_WITH_FWD",
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/floating-point.json b/tools/perf/pmu-events/arch/x86/icelakex/floating-point.json
index 4347e2d0d090..1925388969bb 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/floating-point.json
@@ -99,4 +99,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/frontend.json b/tools/perf/pmu-events/arch/x86/icelakex/frontend.json
index f217c3211ba2..eb27d9d9c8be 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/frontend.json
@@ -481,4 +481,4 @@
         "Speculative": "1",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
index be70672bfdb0..0abdfe433a2c 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
@@ -17,24 +17,6 @@
         "MetricGroup": "Ret;Summary",
         "MetricName": "IPC"
     },
-    {
-        "BriefDescription": "Uops Per Instruction",
-        "MetricExpr": "UOPS_RETIRED.SLOTS / INST_RETIRED.ANY",
-        "MetricGroup": "Pipeline;Ret;Retire",
-        "MetricName": "UPI"
-    },
-    {
-        "BriefDescription": "Instruction per taken branch",
-        "MetricExpr": "UOPS_RETIRED.SLOTS / BR_INST_RETIRED.NEAR_TAKEN",
-        "MetricGroup": "Branches;Fed;FetchBW",
-        "MetricName": "UpTB"
-    },
-    {
-        "BriefDescription": "Cycles Per Instruction (per Logical Processor)",
-        "MetricExpr": "1 / (INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD)",
-        "MetricGroup": "Pipeline;Mem",
-        "MetricName": "CPI"
-    },
     {
         "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
         "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
@@ -50,7 +32,7 @@
     {
         "BriefDescription": "Fraction of Physical Core issue-slots utilized by this Logical Processor",
         "MetricExpr": "TOPDOWN.SLOTS / ( TOPDOWN.SLOTS / 2 ) if #SMT_on else 1",
-        "MetricGroup": "SMT",
+        "MetricGroup": "SMT;TmaL1",
         "MetricName": "Slots_Utilization"
     },
     {
@@ -73,24 +55,18 @@
         "MetricName": "FLOPc"
     },
     {
-        "BriefDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width)",
+        "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)",
         "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE) ) / ( 2 * CPU_CLK_UNHALTED.DISTRIBUTED )",
         "MetricGroup": "Cor;Flops;HPC",
         "MetricName": "FP_Arith_Utilization",
-        "PublicDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width). Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
+        "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)."
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "ILP"
     },
-    {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BadSpec;BrMispredicts",
-        "MetricName": "IpMispredict"
-    },
     {
         "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
         "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED",
@@ -181,36 +157,54 @@
         "MetricName": "IpArith_AVX512",
         "PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
     },
+    {
+        "BriefDescription": "Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / cpu@SW_PREFETCH_ACCESS.T0\\,umask\\=0xF@",
+        "MetricGroup": "Prefetches",
+        "MetricName": "IpSWPF"
+    },
     {
         "BriefDescription": "Total number of retired Instructions, Sample with: INST_RETIRED.PREC_DIST",
         "MetricExpr": "INST_RETIRED.ANY",
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
+        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
+        "MetricName": "Execute"
+    },
     {
         "BriefDescription": "Average number of Uops issued by front-end when it issued something",
         "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=1@",
         "MetricGroup": "Fed;FetchBW",
         "MetricName": "Fetch_UpC"
     },
-    {
-        "BriefDescription": "Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)",
-        "MetricExpr": "LSD.UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
-        "MetricGroup": "Fed;LSD",
-        "MetricName": "LSD_Coverage"
-    },
     {
         "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
-        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
+        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
         "MetricGroup": "DSB;Fed;FetchBW",
         "MetricName": "DSB_Coverage"
     },
     {
-        "BriefDescription": "Number of Instructions per non-speculative DSB miss",
+        "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.",
+        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu@DSB2MITE_SWITCHES.PENALTY_CYCLES\\,cmask\\=1\\,edge@",
+        "MetricGroup": "DSBmiss",
+        "MetricName": "DSB_Switch_Cost"
+    },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative DSB miss (lower number means higher occurrence rate)",
         "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS",
         "MetricGroup": "DSBmiss;Fed",
         "MetricName": "IpDSB_Miss_Ret"
     },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BadSpec;BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
     {
         "BriefDescription": "Fraction of branches that are non-taken conditionals",
         "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_BRANCHES",
@@ -242,11 +236,10 @@
         "MetricName": "Other_Branches"
     },
     {
-        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles)",
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
         "MetricGroup": "Mem;MemoryBound;MemoryLat",
-        "MetricName": "Load_Miss_Real_Latency",
-        "PublicDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles). Latency may be overestimated for multi-load instructions - e.g. repeat strings."
+        "MetricName": "Load_Miss_Real_Latency"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
@@ -254,30 +247,6 @@
         "MetricGroup": "Mem;MemoryBound;MemoryBW",
         "MetricName": "MLP"
     },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
-        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L1D_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
-        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L2_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L3_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW;Offcore",
-        "MetricName": "L3_Cache_Access_BW"
-    },
     {
         "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
         "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
@@ -297,13 +266,13 @@
         "MetricName": "L2MPKI"
     },
     {
-        "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)",
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)",
         "MetricExpr": "1000 * ( ( OFFCORE_REQUESTS.ALL_DATA_RD - OFFCORE_REQUESTS.DEMAND_DATA_RD ) + L2_RQSTS.ALL_DEMAND_MISS + L2_RQSTS.SWPF_MISS ) / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses;Offcore",
         "MetricName": "L2MPKI_All"
     },
     {
-        "BriefDescription": "L2 cache misses per kilo instruction for all demand loads  (including speculative)",
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all demand loads  (including speculative)",
         "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses",
         "MetricName": "L2MPKI_Load"
@@ -321,7 +290,7 @@
         "MetricName": "L3MPKI"
     },
     {
-        "BriefDescription": "Fill Buffer (FB) true hits per kilo instructions for retired demand loads",
+        "BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
         "MetricExpr": "1000 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses",
         "MetricName": "FB_HPKI"
@@ -333,6 +302,30 @@
         "MetricGroup": "Mem;MemoryTLB",
         "MetricName": "Page_Walks_Utilization"
     },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW"
+    },
     {
         "BriefDescription": "Rate of silent evictions from the L2 cache per Kilo instruction where the evicted lines are dropped (no writeback to L3 or memory)",
         "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY",
@@ -345,6 +338,30 @@
         "MetricGroup": "L2Evicts;Mem;Server",
         "MetricName": "L2_Evictions_NonSilent_PKI"
     },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW_1T"
+    },
     {
         "BriefDescription": "Average CPU Utilization",
         "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
@@ -361,7 +378,8 @@
         "BriefDescription": "Giga Floating Point Operations Per Second",
         "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / 1000000000 ) / duration_time",
         "MetricGroup": "Cor;Flops;HPC",
-        "MetricName": "GFLOPs"
+        "MetricName": "GFLOPs",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
@@ -497,5 +515,544 @@
         "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
         "MetricGroup": "Power",
         "MetricName": "C6_Pkg_Residency"
+    },
+    {
+        "BriefDescription": "Percentage of time spent in the active CPU power state C0",
+        "MetricExpr": "100 * CPU_CLK_UNHALTED.REF_TSC / TSC",
+        "MetricGroup": "",
+        "MetricName": "cpu_utilization_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "CPU operating frequency (in GHz)",
+        "MetricExpr": "(( CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC * #SYSTEM_TSC_FREQ ) / 1000000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "cpu_operating_frequency",
+        "ScaleUnit": "1GHz"
+    },
+    {
+        "BriefDescription": "Cycles per instruction retired; indicating how much time each executed instruction took; in units of cycles.",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "cpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "The ratio of number of completed memory load instructions to the total number completed instructions",
+        "MetricExpr": "MEM_INST_RETIRED.ALL_LOADS / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "loads_per_instr",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "The ratio of number of completed memory store instructions to the total number completed instructions",
+        "MetricExpr": "MEM_INST_RETIRED.ALL_STORES / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "stores_per_instr",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of requests missing L1 data cache (includes data+rfo w/ prefetches) to the total number of completed instructions",
+        "MetricExpr": "L1D.REPLACEMENT / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l1d_mpi_includes_data_plus_rfo_with_prefetches",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of demand load requests hitting in L1 data cache to the total number of completed instructions ",
+        "MetricExpr": "MEM_LOAD_RETIRED.L1_HIT / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l1d_demand_data_read_hits_per_instr",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of code read requests missing in L1 instruction cache (includes prefetches) to the total number of completed instructions",
+        "MetricExpr": "L2_RQSTS.ALL_CODE_RD / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l1_i_code_read_misses_with_prefetches_per_instr",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed demand load requests hitting in L2 cache to the total number of completed instructions ",
+        "MetricExpr": "MEM_LOAD_RETIRED.L2_HIT / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l2_demand_data_read_hits_per_instr",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of requests missing L2 cache (includes code+data+rfo w/ prefetches) to the total number of completed instructions",
+        "MetricExpr": "L2_LINES_IN.ALL / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l2_mpi_includes_code_plus_data_plus_rfo_with_prefetches",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed data read request missing L2 cache to the total number of completed instructions",
+        "MetricExpr": "MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l2_demand_data_read_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of code read request missing L2 cache to the total number of completed instructions",
+        "MetricExpr": "L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "l2_demand_code_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of data read requests missing last level core cache (includes demand w/ prefetches) to the total number of completed instructions",
+        "MetricExpr": "( UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFDATA + UNC_CHA_TOR_INSERTS.IA_MISS_DRD + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF ) / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "llc_data_read_mpi_demand_plus_prefetch",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of code read requests missing last level core cache (includes demand w/ prefetches) to the total number of completed instructions",
+        "MetricExpr": "( UNC_CHA_TOR_INSERTS.IA_MISS_CRD ) / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "llc_code_read_mpi_demand_plus_prefetch",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memory access) in nano seconds",
+        "MetricExpr": "( ( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_TOR_INSERTS.IA_MISS_DRD ) / ( UNC_CHA_CLOCKTICKS / ( source_count(UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD) * #num_packages ) ) ) * duration_time )",
+        "MetricGroup": "",
+        "MetricName": "llc_demand_data_read_miss_latency",
+        "ScaleUnit": "1ns"
+    },
+    {
+        "BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memory access) addressed to local memory in nano seconds",
+        "MetricExpr": "( ( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_LOCAL / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL ) / ( UNC_CHA_CLOCKTICKS / ( source_count(UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_LOCAL) * #num_packages ) ) ) * duration_time )",
+        "MetricGroup": "",
+        "MetricName": "llc_demand_data_read_miss_latency_for_local_requests",
+        "ScaleUnit": "1ns"
+    },
+    {
+        "BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memory access) addressed to remote memory in nano seconds",
+        "MetricExpr": "( ( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE ) / ( UNC_CHA_CLOCKTICKS / ( source_count(UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE) * #num_packages ) ) ) * duration_time )",
+        "MetricGroup": "",
+        "MetricName": "llc_demand_data_read_miss_latency_for_remote_requests",
+        "ScaleUnit": "1ns"
+    },
+    {
+        "BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memory access) addressed to Intel(R) Optane(TM) Persistent Memory(PMEM) in nano seconds",
+        "MetricExpr": "( ( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PMM ) / ( UNC_CHA_CLOCKTICKS / ( source_count(UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM) * #num_packages ) ) ) * duration_time )",
+        "MetricGroup": "",
+        "MetricName": "llc_demand_data_read_miss_to_pmem_latency",
+        "ScaleUnit": "1ns"
+    },
+    {
+        "BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memory access) addressed to DRAM in nano seconds",
+        "MetricExpr": "( ( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_DDR / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_DDR ) / ( UNC_CHA_CLOCKTICKS / ( source_count(UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_DDR) * #num_packages ) ) ) * duration_time )",
+        "MetricGroup": "",
+        "MetricName": "llc_demand_data_read_miss_to_dram_latency",
+        "ScaleUnit": "1ns"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed page walks (for all page sizes) caused by a code fetch to the total number of completed instructions. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB.",
+        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "itlb_2nd_level_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed page walks (for 2 megabyte and 4 megabyte page sizes) caused by a code fetch to the total number of completed instructions. This implies it missed in the Instruction Translation Lookaside Buffer (ITLB) and further levels of TLB.",
+        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED_2M_4M / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "itlb_2nd_level_large_page_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed page walks (for all page sizes) caused by demand data loads to the total number of completed instructions. This implies it missed in the DTLB and further levels of TLB.",
+        "MetricExpr": "DTLB_LOAD_MISSES.WALK_COMPLETED / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "dtlb_2nd_level_load_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed page walks (for 2 megabyte page sizes) caused by demand data loads to the total number of completed instructions. This implies it missed in the Data Translation Lookaside Buffer (DTLB) and further levels of TLB.",
+        "MetricExpr": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "dtlb_2nd_level_2mb_large_page_load_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Ratio of number of completed page walks (for all page sizes) caused by demand data stores to the total number of completed instructions. This implies it missed in the DTLB and further levels of TLB.",
+        "MetricExpr": "DTLB_STORE_MISSES.WALK_COMPLETED / INST_RETIRED.ANY",
+        "MetricGroup": "",
+        "MetricName": "dtlb_2nd_level_store_mpi",
+        "ScaleUnit": "1per_instr"
+    },
+    {
+        "BriefDescription": "Memory read that miss the last level cache (LLC) addressed to local DRAM as a percentage of total memory read accesses, does not include LLC prefetches.",
+        "MetricExpr": "100 * ( UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL ) / ( UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_REMOTE )",
+        "MetricGroup": "",
+        "MetricName": "numa_percent_reads_addressed_to_local_dram",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "Memory reads that miss the last level cache (LLC) addressed to remote DRAM as a percentage of total memory read accesses, does not include LLC prefetches.",
+        "MetricExpr": "100 * ( UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_REMOTE ) / ( UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE + UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_REMOTE )",
+        "MetricGroup": "",
+        "MetricName": "numa_percent_reads_addressed_to_remote_dram",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "Uncore operating frequency in GHz",
+        "MetricExpr": "( UNC_CHA_CLOCKTICKS / ( source_count(UNC_CHA_CLOCKTICKS) * #num_packages ) / 1000000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "uncore_frequency",
+        "ScaleUnit": "1GHz"
+    },
+    {
+        "BriefDescription": "Intel(R) Ultra Path Interconnect (UPI) data transmit bandwidth (MB/sec)",
+        "MetricExpr": "( UNC_UPI_TxL_FLITS.ALL_DATA * (64 / 9.0) / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "upi_data_transmit_bw_only_data",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "DDR memory read bandwidth (MB/sec)",
+        "MetricExpr": "( UNC_M_CAS_COUNT.RD * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "memory_bandwidth_read",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "DDR memory write bandwidth (MB/sec)",
+        "MetricExpr": "( UNC_M_CAS_COUNT.WR * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "memory_bandwidth_write",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "DDR memory bandwidth (MB/sec)",
+        "MetricExpr": "(( UNC_M_CAS_COUNT.RD + UNC_M_CAS_COUNT.WR ) * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "memory_bandwidth_total",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Intel(R) Optane(TM) Persistent Memory(PMEM) memory read bandwidth (MB/sec)",
+        "MetricExpr": "( UNC_M_PMM_RPQ_INSERTS * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "pmem_memory_bandwidth_read",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Intel(R) Optane(TM) Persistent Memory(PMEM) memory write bandwidth (MB/sec)",
+        "MetricExpr": "( UNC_M_PMM_WPQ_INSERTS * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "pmem_memory_bandwidth_write",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Intel(R) Optane(TM) Persistent Memory(PMEM) memory bandwidth (MB/sec)",
+        "MetricExpr": "(( UNC_M_PMM_RPQ_INSERTS + UNC_M_PMM_WPQ_INSERTS ) * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "pmem_memory_bandwidth_total",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Bandwidth of IO reads that are initiated by end device controllers that are requesting memory from the CPU.",
+        "MetricExpr": "(( UNC_CHA_TOR_INSERTS.IO_HIT_PCIRDCUR + UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR ) * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "io_bandwidth_read",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Bandwidth of IO writes that are initiated by end device controllers that are writing memory to the CPU.",
+        "MetricExpr": "(( UNC_CHA_TOR_INSERTS.IO_HIT_ITOM + UNC_CHA_TOR_INSERTS.IO_MISS_ITOM + UNC_CHA_TOR_INSERTS.IO_HIT_ITOMCACHENEAR + UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR ) * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "io_bandwidth_write",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Uops delivered from decoded instruction cache (decoded stream buffer or DSB) as a percent of total uops delivered to Instruction Decode Queue",
+        "MetricExpr": "100 * ( IDQ.DSB_UOPS / ( IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS + LSD.UOPS ) )",
+        "MetricGroup": "",
+        "MetricName": "percent_uops_delivered_from_decoded_icache_dsb",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "Uops delivered from legacy decode pipeline (Micro-instruction Translation Engine or MITE) as a percent of total uops delivered to Instruction Decode Queue",
+        "MetricExpr": "100 * ( IDQ.MITE_UOPS / ( IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS + LSD.UOPS ) )",
+        "MetricGroup": "",
+        "MetricName": "percent_uops_delivered_from_legacy_decode_pipeline_mite",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "Uops delivered from microcode sequencer (MS) as a percent of total uops delivered to Instruction Decode Queue",
+        "MetricExpr": "100 * ( IDQ.MS_UOPS / ( IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS + LSD.UOPS ) )",
+        "MetricGroup": "",
+        "MetricName": "percent_uops_delivered_from_microcode_sequencer_ms",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "Bandwidth (MB/sec) of read requests that miss the last level cache (LLC) and go to local memory.",
+        "MetricExpr": "( UNC_CHA_REQUESTS.READS_LOCAL * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "llc_miss_local_memory_bandwidth_read",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Bandwidth (MB/sec) of write requests that miss the last level cache (LLC) and go to local memory.",
+        "MetricExpr": "( UNC_CHA_REQUESTS.WRITES_LOCAL * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "llc_miss_local_memory_bandwidth_write",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Bandwidth (MB/sec) of read requests that miss the last level cache (LLC) and go to remote memory.",
+        "MetricExpr": "( UNC_CHA_REQUESTS.READS_REMOTE * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "llc_miss_remote_memory_bandwidth_read",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "Bandwidth (MB/sec) of write requests that miss the last level cache (LLC) and go to remote memory.",
+        "MetricExpr": "( UNC_CHA_REQUESTS.WRITES_REMOTE * 64 / 1000000) / duration_time",
+        "MetricGroup": "",
+        "MetricName": "llc_miss_remote_memory_bandwidth_write",
+        "ScaleUnit": "1MB/s"
+    },
+    {
+        "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend. Frontend denotes the first part of the processor core responsible to fetch operations that are executed later on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch; cache-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into micro-operations (uops). Ideally the Frontend can issue Machine_Width uops every cycle to the Backend. Frontend Bound denotes unutilized issue-slots when there is no Backend stall; i.e. bubbles where Frontend delivered no uops while Backend could have accepted them. For example; stalls due to instruction-cache misses would be categorized under Frontend Bound.",
+        "MetricExpr": "100 * ( topdown\\-fe\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) - INT_MISC.UOP_DROPPING / ( slots ) )",
+        "MetricGroup": "TmaL1;PGO",
+        "MetricName": "tma_frontend_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Frontend Latency. In such cases; the Frontend eventually delivers no uops for some period.",
+        "MetricExpr": "100 * ( ( ( 5 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE - INT_MISC.UOP_DROPPING ) / ( slots ) )",
+        "MetricGroup": "Frontend;TmaL2;m_tma_frontend_bound_percent",
+        "MetricName": "tma_fetch_latency_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to instruction cache misses.",
+        "MetricExpr": "100 * ( ICACHE_16B.IFDATA_STALL / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "BigFoot;FetchLat;IcMiss;TmaL3;m_tma_fetch_latency_percent",
+        "MetricName": "tma_icache_misses_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses.",
+        "MetricExpr": "100 * ( ICACHE_64B.IFTAG_STALL / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TmaL3;m_tma_fetch_latency_percent",
+        "MetricName": "tma_itlb_misses_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers. Branch Resteers estimates the Frontend delay in fetching operations from corrected path; following all sorts of miss-predicted branches. For example; branchy code with lots of miss-predictions might get categorized under Branch Resteers. Note the value of this node may overlap with its siblings.",
+        "MetricExpr": "100 * ( INT_MISC.CLEAR_RESTEER_CYCLES / ( CPU_CLK_UNHALTED.THREAD ) + ( ( 10 ) * BACLEARS.ANY / ( CPU_CLK_UNHALTED.THREAD ) ) )",
+        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
+        "MetricName": "tma_branch_resteers_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty.",
+        "MetricExpr": "100 * ( DSB2MITE_SWITCHES.PENALTY_CYCLES / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "DSBmiss;FetchLat;TmaL3;m_tma_fetch_latency_percent",
+        "MetricName": "tma_dsb_switches_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs.",
+        "MetricExpr": "100 * ( ILD_STALL.LCP / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
+        "MetricName": "tma_lcp_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric estimates the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode Sequencer (MS). Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legacy instruction decode) pipelines. Certain operations cannot be handled natively by the execution pipeline; and must be performed by microcode (small programs injected into the execution stream). Switching to the MS too often can negatively impact performance. The MS is designated to deliver long uop flows required by CISC instructions like CPUID; or uncommon conditions like Floating Point Assists when dealing with Denormals.",
+        "MetricExpr": "100 * ( ( 3 ) * IDQ.MS_SWITCHES / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "FetchLat;MicroSeq;TmaL3;m_tma_fetch_latency_percent",
+        "MetricName": "tma_ms_switches_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues.  For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend.",
+        "MetricExpr": "100 * ( max( 0 , ( topdown\\-fe\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) - INT_MISC.UOP_DROPPING / ( slots ) ) - ( ( ( 5 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE - INT_MISC.UOP_DROPPING ) / ( slots ) ) ) )",
+        "MetricGroup": "FetchBW;Frontend;TmaL2;m_tma_frontend_bound_percent",
+        "MetricName": "tma_fetch_bandwidth_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents Core fraction of cycles in which CPU was likely limited due to the MITE pipeline (the legacy decode pipeline). This pipeline is used for code that was not pre-cached in the DSB or LSD. For example; inefficiencies due to asymmetric decoders; use of long immediate or LCP can manifest as MITE fetch bandwidth bottleneck.",
+        "MetricExpr": "100 * ( ( IDQ.MITE_CYCLES_ANY - IDQ.MITE_CYCLES_OK ) / ( CPU_CLK_UNHALTED.DISTRIBUTED ) / 2 )",
+        "MetricGroup": "DSBmiss;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
+        "MetricName": "tma_mite_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents Core fraction of cycles in which CPU was likely limited due to DSB (decoded uop cache) fetch pipeline.  For example; inefficient utilization of the DSB cache structure or bank conflict when reading from it; are categorized here.",
+        "MetricExpr": "100 * ( ( IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK ) / ( CPU_CLK_UNHALTED.DISTRIBUTED ) / 2 )",
+        "MetricGroup": "DSB;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
+        "MetricName": "tma_dsb_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
+        "MetricExpr": "100 * ( max( 1 - ( ( topdown\\-fe\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) - INT_MISC.UOP_DROPPING / ( slots ) ) + ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) ) , 0 ) )",
+        "MetricGroup": "TmaL1",
+        "MetricName": "tma_bad_speculation_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction.  These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path.",
+        "MetricExpr": "100 * ( ( BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( max( 1 - ( ( topdown\\-fe\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) - INT_MISC.UOP_DROPPING / ( slots ) ) + ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) ) , 0 ) ) )",
+        "MetricGroup": "BadSpec;BrMispredicts;TmaL2;m_tma_bad_speculation_percent",
+        "MetricName": "tma_branch_mispredicts_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears.  These slots are either wasted by uops fetched prior to the clear; or stalls the out-of-order portion of the machine needs to recover its state after the clear. For example; this can happen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes.",
+        "MetricExpr": "100 * ( max( 0 , ( max( 1 - ( ( topdown\\-fe\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) - INT_MISC.UOP_DROPPING / ( slots ) ) + ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) ) , 0 ) ) - ( ( BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( max( 1 - ( ( topdown\\-fe\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) - INT_MISC.UOP_DROPPING / ( slots ) ) + ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) ) , 0 ) ) ) ) )",
+        "MetricGroup": "BadSpec;MachineClears;TmaL2;m_tma_bad_speculation_percent",
+        "MetricName": "tma_machine_clears_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound.",
+        "MetricExpr": "100 * ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) )",
+        "MetricGroup": "TmaL1",
+        "MetricName": "tma_backend_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Backend was a bottleneck.  Memory Bound estimates fraction of slots where pipeline is likely stalled due to demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory demand loads which coincides with execution units starvation; in addition to (2) cases where stores could impose backpressure on the pipeline when many of them get buffered at the same time (less common out of the two).",
+        "MetricExpr": "100 * ( ( ( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / ( CYCLE_ACTIVITY.STALLS_TOTAL + ( EXE_ACTIVITY.1_PORTS_UTIL + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * EXE_ACTIVITY.2_PORTS_UTIL ) + EXE_ACTIVITY.BOUND_ON_STORES ) ) * ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) )",
+        "MetricGroup": "Backend;TmaL2;m_tma_backend_bound_percent",
+        "MetricName": "tma_memory_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric estimates how often the CPU was stalled without loads missing the L1 data cache.  The L1 data cache typically has the shortest latency.  However; in certain cases like loads blocked on older stores; a load might suffer due to high latency even though it is being satisfied by the L1. Another example is loads who miss in the TLB. These cases are characterized by execution unit stalls; while some non-completed demand load lives in the machine without having that demand load missing the L1 cache.",
+        "MetricExpr": "100 * ( max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) , 0 ) )",
+        "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
+        "MetricName": "tma_l1_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads.  Avoiding cache misses (i.e. L1 misses/L2 hits) can improve the latency and increase performance.",
+        "MetricExpr": "100 * ( ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) / ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + L1D_PEND_MISS.FB_FULL_PERIODS ) ) * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) )",
+        "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
+        "MetricName": "tma_l2_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core.  Avoiding cache misses (i.e. L2 misses/L3 hits) can improve the latency and increase performance.",
+        "MetricExpr": "100 * ( ( CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
+        "MetricName": "tma_l3_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads. Better caching can improve the latency and increase performance.",
+        "MetricExpr": "100 * ( min( ( ( ( CYCLE_ACTIVITY.STALLS_L3_MISS / ( CPU_CLK_UNHALTED.THREAD ) + ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) - ( ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) / ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + L1D_PEND_MISS.FB_FULL_PERIODS ) ) * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( min( ( ( ( ( 1 - ( ( ( 19 * ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + 10 * ( ( MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) ) / ( ( 19 * ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + 10 * ( ( MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) ) + ( 25 * ( ( MEM_LOAD_RETIRED.LOCAL_PMM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) + 33 * ( ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_PMM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) ) ) ) ) ) * ( CYCLE_ACTIVITY.STALLS_L3_MISS / ( CPU_CLK_UNHALTED.THREAD ) + ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) - ( ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) / ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + L1D_PEND_MISS.FB_FULL_PERIODS ) ) * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) ) ) ) if ( ( 1000000 ) * ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_PMM + MEM_LOAD_RETIRED.LOCAL_PMM ) > MEM_LOAD_RETIRED.L1_MISS ) else 0 ) ) , ( 1 ) ) ) ) ) , ( 1 ) ) )",
+        "MetricGroup": "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
+        "MetricName": "tma_dram_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric roughly estimates (based on idle latencies) how often the CPU was stalled on accesses to external 3D-Xpoint (Crystal Ridge, a.k.a. IXP) memory by loads, PMM stands for Persistent Memory Module. ",
+        "MetricExpr": "100 * ( min( ( ( ( ( 1 - ( ( ( 19 * ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + 10 * ( ( MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) ) / ( ( 19 * ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + 10 * ( ( MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) ) + ( 25 * ( ( MEM_LOAD_RETIRED.LOCAL_PMM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) + 33 * ( ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_PMM * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) ) ) ) ) ) ) * ( CYCLE_ACTIVITY.STALLS_L3_MISS / ( CPU_CLK_UNHALTED.THREAD ) + ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) - ( ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) / ( ( MEM_LOAD_RETIRED.L2_HIT * ( 1 + ( MEM_LOAD_RETIRED.FB_HIT / ( MEM_LOAD_RETIRED.L1_MISS ) ) ) ) + L1D_PEND_MISS.FB_FULL_PERIODS ) ) * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) ) ) ) ) if ( ( 1000000 ) * ( MEM_LOAD_L3_MISS_RETIRED.REMOTE_PMM + MEM_LOAD_RETIRED.LOCAL_PMM ) > MEM_LOAD_RETIRED.L1_MISS ) else 0 ) ) , ( 1 ) ) )",
+        "MetricGroup": "MemoryBound;Server;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
+        "MetricName": "tma_pmm_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric estimates how often CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write. Even though store accesses do not typically stall out-of-order CPUs; there are few cases where stores can lead to actual stalls. This metric will be flagged should RFO stores be a bottleneck.",
+        "MetricExpr": "100 * ( EXE_ACTIVITY.BOUND_ON_STORES / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
+        "MetricName": "tma_store_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in software's instructions are both categorized under Core Bound. Hence it may indicate the machine ran out of an out-of-order resource; certain execution units are overloaded or dependencies in program's data- or instruction-flow are limiting the performance (e.g. FP-chained long-latency arithmetic operations).",
+        "MetricExpr": "100 * ( max( 0 , ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) - ( ( ( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / ( CYCLE_ACTIVITY.STALLS_TOTAL + ( EXE_ACTIVITY.1_PORTS_UTIL + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * EXE_ACTIVITY.2_PORTS_UTIL ) + EXE_ACTIVITY.BOUND_ON_STORES ) ) * ( topdown\\-be\\-bound / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) + ( ( 5 ) * cpu@INT_MISC.RECOVERY_CYCLES\\,cmask\\=0x1\\,edge\\=0x1@ ) / ( slots ) ) ) ) )",
+        "MetricGroup": "Backend;TmaL2;Compute;m_tma_backend_bound_percent",
+        "MetricName": "tma_core_bound_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of cycles where the Divider unit was active. Divide and square root instructions are performed by the Divider unit and can take considerably longer latency than integer or Floating Point addition; subtraction; or multiplication.",
+        "MetricExpr": "100 * ( ARITH.DIVIDER_ACTIVE / ( CPU_CLK_UNHALTED.THREAD ) )",
+        "MetricGroup": "TmaL3;m_tma_core_bound_percent",
+        "MetricName": "tma_divider_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category.  Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved.  Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance.  For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. ",
+        "MetricExpr": "( 100 * ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) ) + ( 0 * slots )",
+        "MetricGroup": "TmaL1",
+        "MetricName": "tma_retiring_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring light-weight operations -- instructions that require no more than one uop (micro-operation). This correlates with total number of instructions used by the program. A uops-per-instruction (see UPI metric) ratio of 1 or less should be expected for decently optimized software running on Intel Core/Xeon products. While this often indicates efficient X86 instructions were executed; high value does not necessarily mean better performance cannot be achieved.",
+        "MetricExpr": "100 * ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) )",
+        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
+        "MetricName": "tma_light_operations_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may exceed its parent due to use of \"Uops\" CountDomain and FMA double-counting.",
+        "MetricExpr": "100 * ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * UOPS_EXECUTED.X87 / UOPS_EXECUTED.THREAD ) + ( ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) ) + ( min( ( ( FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) ) , ( 1 ) ) ) )",
+        "MetricGroup": "HPC;TmaL3;m_tma_light_operations_percent",
+        "MetricName": "tma_fp_arith_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
+        "MetricExpr": "100 * ( ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY )",
+        "MetricGroup": "Pipeline;TmaL3;m_tma_light_operations_percent",
+        "MetricName": "tma_memory_operations_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring branch instructions.",
+        "MetricExpr": "100 * ( ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) * BR_INST_RETIRED.ALL_BRANCHES / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) )",
+        "MetricGroup": "Pipeline;TmaL3;m_tma_light_operations_percent",
+        "MetricName": "tma_branch_instructions_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address of a function or loop body.",
+        "MetricExpr": "100 * ( ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) * INST_RETIRED.NOP / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) )",
+        "MetricGroup": "Pipeline;TmaL3;m_tma_light_operations_percent",
+        "MetricName": "tma_nop_instructions_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes. May undercount due to FMA double counting",
+        "MetricExpr": "100 * ( max( 0 , ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) - ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * UOPS_EXECUTED.X87 / UOPS_EXECUTED.THREAD ) + ( ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) ) + ( min( ( ( FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) ) , ( 1 ) ) ) ) + ( ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) * MEM_INST_RETIRED.ANY / INST_RETIRED.ANY ) + ( ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) * BR_INST_RETIRED.ALL_BRANCHES / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) ) + ( ( max( 0 , ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) - ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) ) ) * INST_RETIRED.NOP / ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) ) ) ) )",
+        "MetricGroup": "Pipeline;TmaL3;m_tma_light_operations_percent",
+        "MetricName": "tma_other_light_ops_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructions that require two or more uops or microcoded sequences. This highly-correlates with the uop length of these instructions/sequences.",
+        "MetricExpr": "100 * ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS )",
+        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
+        "MetricName": "tma_heavy_operations_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots where the CPU was retiring instructions that that are decoder into two or up to ([SNB+] four; [ADL+] five) uops. This highly-correlates with the number of uops in such instructions.",
+        "MetricExpr": "100 * ( ( ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) + ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=0x1@ ) / IDQ.MITE_UOPS ) - ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) ) )",
+        "MetricGroup": "TmaL3;m_tma_heavy_operations_percent",
+        "MetricName": "tma_few_uops_instructions_percent",
+        "ScaleUnit": "1%"
+    },
+    {
+        "BriefDescription": "This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit.  The MS is used for CISC instructions not supported by the default decoders (like repeat move strings; or CPUID); or by microcode assists used to address some operation modes (like in Floating Point assists). These cases can often be avoided.",
+        "MetricExpr": "100 * ( ( ( ( topdown\\-retiring / ( topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound ) ) * ( slots ) ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( slots ) )",
+        "MetricGroup": "MicroSeq;TmaL3;m_tma_heavy_operations_percent",
+        "MetricName": "tma_microcode_sequencer_percent",
+        "ScaleUnit": "1%"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/memory.json b/tools/perf/pmu-events/arch/x86/icelakex/memory.json
index 58b03a8a1b95..48e8d1102b9d 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/memory.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/memory.json
@@ -306,7 +306,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were not supplied by the local socket's L1, L2, or L3 caches.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were not supplied by the local socket's L1, L2, or L3 caches.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_MISS",
@@ -317,7 +317,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were not supplied by the local socket's L1, L2, or L3 caches and were supplied by the local socket.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were not supplied by the local socket's L1, L2, or L3 caches and were supplied by the local socket.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_MISS_LOCAL",
@@ -328,7 +328,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that missed the L3 Cache and were supplied by the local socket (DRAM or PMM), whether or not in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts PMM or DRAM accesses that are controlled by the close or distant SNC Cluster.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that missed the L3 Cache and were supplied by the local socket (DRAM or PMM), whether or not in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts PMM or DRAM accesses that are controlled by the close or distant SNC Cluster.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.L3_MISS_LOCAL_SOCKET",
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/other.json b/tools/perf/pmu-events/arch/x86/icelakex/other.json
index c9bf6808ead7..919e620e7db8 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/other.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/other.json
@@ -44,7 +44,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating the line will now be (I)nvalidated: removed from this core's cache, after the data is forwarded back to the requestor and indicating the data was found unmodified in the (FE) Forward or Exclusive State in this cores caches cache.  A single snoop response from the core counts on all hyperthreads of the core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x20"
     },
     {
@@ -56,7 +55,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating the line will now be (I)nvalidated: removed from this core's caches, after the data is forwarded back to the requestor, and indicating the data was found modified(M) in this cores caches cache (aka HitM response).  A single snoop response from the core counts on all hyperthreads of the core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -68,7 +66,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating the line will now be (I)nvalidated in this core's caches without being forwarded back to the requestor. The line was in Forward, Shared or Exclusive (FSE) state in this cores caches.  A single snoop response from the core counts on all hyperthreads of the core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -80,7 +77,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating that the data was not found (IHitI) in this core's caches. A single snoop response from the core counts on all hyperthreads of the Core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -92,7 +88,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating the line may be kept on this core in the (S)hared state, after the data is forwarded back to the requestor, initially the data was found in the cache in the (FS) Forward or Shared state.  A single snoop response from the core counts on all hyperthreads of the core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x40"
     },
     {
@@ -104,7 +99,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating the line may be kept on this core in the (S)hared state, after the data is forwarded back to the requestor, initially the data was found in the cache in the (M)odified state.  A single snoop response from the core counts on all hyperthreads of the core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -116,7 +110,6 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts responses to snoops indicating the line was kept on this core in the (S)hared state, and that the data was found unmodified but not forwarded back to the requestor, initially the data was found in the cache in the (FSE) Forward, Shared state or Exclusive state.  A single snoop response from the core counts on all hyperthreads of the core.",
         "SampleAfterValue": "1000003",
-        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -428,7 +421,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that have any type of response.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that have any type of response.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.ANY_RESPONSE",
@@ -439,7 +432,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.DRAM",
@@ -450,7 +443,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM attached to this socket, unless in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts only those DRAM accesses that are controlled by the close SNC Cluster.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM attached to this socket, unless in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts only those DRAM accesses that are controlled by the close SNC Cluster.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.LOCAL_DRAM",
@@ -461,7 +454,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM attached to this socket, unless in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts only those PMM accesses that are controlled by the close SNC Cluster.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM attached to this socket, unless in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts only those PMM accesses that are controlled by the close SNC Cluster.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.LOCAL_PMM",
@@ -472,7 +465,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM attached to this socket, whether or not in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts DRAM accesses that are controlled by the close or distant SNC Cluster.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM attached to this socket, whether or not in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts DRAM accesses that are controlled by the close or distant SNC Cluster.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.LOCAL_SOCKET_DRAM",
@@ -483,7 +476,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM attached to this socket, whether or not in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts PMM accesses that are controlled by the close or distant SNC Cluster.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM attached to this socket, whether or not in Sub NUMA Cluster(SNC) Mode.  In SNC Mode counts PMM accesses that are controlled by the close or distant SNC Cluster.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.LOCAL_SOCKET_PMM",
@@ -494,7 +487,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were not supplied by the local socket's L1, L2, or L3 caches and were supplied by a remote socket.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were not supplied by the local socket's L1, L2, or L3 caches and were supplied by a remote socket.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.REMOTE",
@@ -505,7 +498,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM attached to another socket.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM attached to another socket.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.REMOTE_DRAM",
@@ -516,7 +509,18 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM attached to another socket.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM or PMM attached to another socket.",
+        "Counter": "0,1,2,3",
+        "EventCode": "0xB7, 0xBB",
+        "EventName": "OCR.READS_TO_CORE.REMOTE_MEMORY",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x731800477",
+        "Offcore": "1",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM attached to another socket.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.REMOTE_PMM",
@@ -527,7 +531,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM on a distant memory controller of this socket when the system is in SNC (sub-NUMA cluster) mode.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by DRAM on a distant memory controller of this socket when the system is in SNC (sub-NUMA cluster) mode.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.SNC_DRAM",
@@ -538,7 +542,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Counts all data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM on a distant memory controller of this socket when the system is in SNC (sub-NUMA cluster) mode.",
+        "BriefDescription": "Counts all (cacheable) data read, code read and RFO requests including demands and prefetches to the core caches (L1 or L2) that were supplied by PMM on a distant memory controller of this socket when the system is in SNC (sub-NUMA cluster) mode.",
         "Counter": "0,1,2,3",
         "EventCode": "0xB7, 0xBB",
         "EventName": "OCR.READS_TO_CORE.SNC_PMM",
@@ -558,5 +562,16 @@
         "Offcore": "1",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
+    },
+    {
+        "BriefDescription": "Counts Demand RFOs, ItoM's, PREFECTHW's, Hardware RFO Prefetches to the L1/L2 and Streaming stores that likely resulted in a store to Memory (DRAM or PMM)",
+        "Counter": "0,1,2,3",
+        "EventCode": "0xB7, 0xBB",
+        "EventName": "OCR.WRITE_ESTIMATE.MEMORY",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0xFBFF80822",
+        "Offcore": "1",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
index 95c1008ef057..396868f70004 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json
@@ -214,6 +214,18 @@
         "SampleAfterValue": "50021",
         "UMask": "0x20"
     },
+    {
+        "BriefDescription": "This event counts the number of mispredicted ret instructions retired. Non PEBS",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc5",
+        "EventName": "BR_MISP_RETIRED.RET",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "PublicDescription": "This is a non-precise version (that is, does not use PEBS) of the event that counts mispredicted return instructions retired.",
+        "SampleAfterValue": "50021",
+        "UMask": "0x8"
+    },
     {
         "BriefDescription": "Cycle counts are evenly distributed between active threads in the Core.",
         "CollectPEBSRecord": "2",
diff --git a/tools/perf/pmu-events/arch/x86/icelakex/virtual-memory.json b/tools/perf/pmu-events/arch/x86/icelakex/virtual-memory.json
index bc43ea855840..d70864da5c67 100644
--- a/tools/perf/pmu-events/arch/x86/icelakex/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/virtual-memory.json
@@ -266,4 +266,4 @@
         "Speculative": "1",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index aae90ece2013..727631ce1a57 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -11,6 +11,7 @@ GenuineIntel-6-7A,v1.01,goldmontplus,core
 GenuineIntel-6-(3C|45|46),v31,haswell,core
 GenuineIntel-6-3F,v25,haswellx,core
 GenuineIntel-6-(7D|7E|A7),v1.14,icelake,core
+GenuineIntel-6-6[AC],v1.15,icelakex,core
 GenuineIntel-6-3A,v18,ivybridge,core
 GenuineIntel-6-3E,v19,ivytown,core
 GenuineIntel-6-2D,v20,jaketown,core
@@ -31,8 +32,6 @@ GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
 GenuineIntel-6-55-[01234],v1,skylakex,core
 GenuineIntel-6-8[CD],v1,tigerlake,core
-GenuineIntel-6-6A,v1,icelakex,core
-GenuineIntel-6-6C,v1,icelakex,core
 GenuineIntel-6-86,v1,tremontx,core
 GenuineIntel-6-8F,v1,sapphirerapids,core
 AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 15/31] perf vendor events: Update Intel ivybridge
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (6 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 14/31] perf vendor events: Update Intel icelakex Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 19/31] perf vendor events: Add Intel meteorlake Ian Rogers
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the ivybridge files into perf and update mapfile.csv.

Tested on a non-ivybridge with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 .../pmu-events/arch/x86/ivybridge/cache.json  |  2 +-
 .../arch/x86/ivybridge/floating-point.json    |  2 +-
 .../arch/x86/ivybridge/frontend.json          |  2 +-
 .../arch/x86/ivybridge/ivb-metrics.json       | 94 +++++++++++++------
 .../pmu-events/arch/x86/ivybridge/memory.json |  2 +-
 .../pmu-events/arch/x86/ivybridge/other.json  |  2 +-
 .../arch/x86/ivybridge/pipeline.json          |  4 +-
 .../arch/x86/ivybridge/uncore-other.json      |  2 +-
 .../arch/x86/ivybridge/virtual-memory.json    |  2 +-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |  2 +-
 10 files changed, 75 insertions(+), 39 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/cache.json b/tools/perf/pmu-events/arch/x86/ivybridge/cache.json
index 62e9705daa19..8adb2e45e23d 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/cache.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/cache.json
@@ -1099,4 +1099,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x10"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/floating-point.json b/tools/perf/pmu-events/arch/x86/ivybridge/floating-point.json
index db8b1c4fceb0..4c2ac010cf55 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/floating-point.json
@@ -166,4 +166,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/frontend.json b/tools/perf/pmu-events/arch/x86/ivybridge/frontend.json
index c956a0a51312..2b1a82dd86ab 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/frontend.json
@@ -312,4 +312,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
index 87670226f52d..3f48e75f8a86 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
@@ -130,17 +130,11 @@
         "MetricName": "FLOPc_SMT"
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "UOPS_EXECUTED.THREAD / (( cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "ILP"
     },
-    {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BadSpec;BrMispredicts",
-        "MetricName": "IpMispredict"
-    },
     {
         "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
         "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
@@ -196,6 +190,18 @@
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
+        "MetricGroup": "Pipeline;Ret",
+        "MetricName": "Retire"
+    },
+    {
+        "BriefDescription": "",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
+        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
+        "MetricName": "Execute"
+    },
     {
         "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
         "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
@@ -203,11 +209,16 @@
         "MetricName": "DSB_Coverage"
     },
     {
-        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles)",
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BadSpec;BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
+    {
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
         "MetricGroup": "Mem;MemoryBound;MemoryLat",
-        "MetricName": "Load_Miss_Real_Latency",
-        "PublicDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles). Latency may be overestimated for multi-load instructions - e.g. repeat strings."
+        "MetricName": "Load_Miss_Real_Latency"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
@@ -215,24 +226,6 @@
         "MetricGroup": "Mem;MemoryBound;MemoryBW",
         "MetricName": "MLP"
     },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
-        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L1D_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
-        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L2_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L3_Cache_Fill_BW"
-    },
     {
         "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
         "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / INST_RETIRED.ANY",
@@ -264,6 +257,48 @@
         "MetricGroup": "Mem;MemoryTLB_SMT",
         "MetricName": "Page_Walks_Utilization_SMT"
     },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "0",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW_1T"
+    },
     {
         "BriefDescription": "Average CPU Utilization",
         "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
@@ -280,7 +315,8 @@
         "BriefDescription": "Giga Floating Point Operations Per Second",
         "MetricExpr": "( ( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE ) / 1000000000 ) / duration_time",
         "MetricGroup": "Cor;Flops;HPC",
-        "MetricName": "GFLOPs"
+        "MetricName": "GFLOPs",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/memory.json b/tools/perf/pmu-events/arch/x86/ivybridge/memory.json
index 5f98f7746cf7..30fc0af61eb3 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/memory.json
@@ -233,4 +233,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/other.json b/tools/perf/pmu-events/arch/x86/ivybridge/other.json
index 83fe8f79adc6..2d62521791d8 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/other.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/other.json
@@ -41,4 +41,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
index 2de31c56c2a5..d89d3f8db190 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json
@@ -676,7 +676,7 @@
         "UMask": "0x3"
     },
     {
-        "BriefDescription": "Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc.)",
+        "BriefDescription": "Number of occurrences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc.)",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "CounterMask": "1",
@@ -1269,4 +1269,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/uncore-other.json b/tools/perf/pmu-events/arch/x86/ivybridge/uncore-other.json
index 6278068908cf..88f1e326205f 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/uncore-other.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/uncore-other.json
@@ -82,10 +82,10 @@
     {
         "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles.",
         "Counter": "Fixed",
+        "EventCode": "0xff",
         "EventName": "UNC_CLOCK.SOCKET",
         "PerPkg": "1",
         "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
-        "UMask": "0x01",
         "Unit": "ARB"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/virtual-memory.json b/tools/perf/pmu-events/arch/x86/ivybridge/virtual-memory.json
index 8cf1549797b0..a5e387bbb134 100644
--- a/tools/perf/pmu-events/arch/x86/ivybridge/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/ivybridge/virtual-memory.json
@@ -177,4 +177,4 @@
         "SampleAfterValue": "100007",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 727631ce1a57..cc34f6378d89 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -12,7 +12,7 @@ GenuineIntel-6-(3C|45|46),v31,haswell,core
 GenuineIntel-6-3F,v25,haswellx,core
 GenuineIntel-6-(7D|7E|A7),v1.14,icelake,core
 GenuineIntel-6-6[AC],v1.15,icelakex,core
-GenuineIntel-6-3A,v18,ivybridge,core
+GenuineIntel-6-3A,v22,ivybridge,core
 GenuineIntel-6-3E,v19,ivytown,core
 GenuineIntel-6-2D,v20,jaketown,core
 GenuineIntel-6-57,v9,knightslanding,core
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 19/31] perf vendor events: Add Intel meteorlake
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (7 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 15/31] perf vendor events: Update Intel ivybridge Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 20/31] perf vendor events: Update Intel nehalemep Ian Rogers
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the events and metrics. Manually copy
the meteorlake files into perf and update mapfile.csv.

Tested on a non-meteorlake with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   1 +
 .../pmu-events/arch/x86/meteorlake/cache.json | 262 ++++++++++++++++++
 .../arch/x86/meteorlake/frontend.json         |  24 ++
 .../arch/x86/meteorlake/memory.json           | 185 +++++++++++++
 .../pmu-events/arch/x86/meteorlake/other.json |  46 +++
 .../arch/x86/meteorlake/pipeline.json         | 254 +++++++++++++++++
 .../arch/x86/meteorlake/virtual-memory.json   |  46 +++
 7 files changed, 818 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index fbb2c9dcdc7b..7d6731e81102 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -16,6 +16,7 @@ GenuineIntel-6-3A,v22,ivybridge,core
 GenuineIntel-6-3E,v21,ivytown,core
 GenuineIntel-6-2D,v21,jaketown,core
 GenuineIntel-6-(57|85),v9,knightslanding,core
+GenuineIntel-6-AA,v1.00,meteorlake,core
 GenuineIntel-6-1E,v2,nehalemep,core
 GenuineIntel-6-1F,v2,nehalemep,core
 GenuineIntel-6-1A,v2,nehalemep,core
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json
new file mode 100644
index 000000000000..32b2aa9b1475
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json
@@ -0,0 +1,262 @@
+[
+    {
+        "BriefDescription": "Counts the number of cacheable memory requests that miss in the LLC. Counts on a per core basis.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2e",
+        "EventName": "LONGEST_LAT_CACHE.MISS",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0x41",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of cacheable memory requests that access the LLC. Counts on a per core basis.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2e",
+        "EventName": "LONGEST_LAT_CACHE.REFERENCE",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0x4f",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of load ops retired.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.ALL_LOADS",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0x81",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of store ops retired.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.ALL_STORES",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0x82",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_128",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x80",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_16",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x10",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_256",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x100",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_32",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x20",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_4",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x4",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_512",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x200",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_64",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x40",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of tagged load uops retired that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD - Only counts with PEBS enabled.",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_8",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x8",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "TakenAlone": "1",
+        "UMask": "0x5",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of  stores uops retired same as MEM_UOPS_RETIRED.ALL_STORES",
+        "CollectPEBSRecord": "3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_UOPS_RETIRED.STORE_LATENCY",
+        "PEBS": "2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x6",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "L2 code requests",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.ALL_CODE_RD",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "200003",
+        "UMask": "0xe4",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Demand Data Read access L2 cache",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "200003",
+        "UMask": "0xe1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Core-originated cacheable requests that missed L3  (Except hardware prefetches to the L3)",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2e",
+        "EventName": "LONGEST_LAT_CACHE.MISS",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "100003",
+        "UMask": "0x41",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Core-originated cacheable requests that refer to L3 (Except hardware prefetches to the L3)",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2e",
+        "EventName": "LONGEST_LAT_CACHE.REFERENCE",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "100003",
+        "UMask": "0x4f",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Retired load instructions.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_INST_RETIRED.ALL_LOADS",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x81",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Retired store instructions.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "Data_LA": "1",
+        "EventCode": "0xd0",
+        "EventName": "MEM_INST_RETIRED.ALL_STORES",
+        "L1_Hit_Indication": "1",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x82",
+        "Unit": "cpu_core"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
new file mode 100644
index 000000000000..9657768fc95a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
@@ -0,0 +1,24 @@
+[
+    {
+        "BriefDescription": "Counts every time the code stream enters into a new cache line by walking sequential from the previous line or being redirected by a jump.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x80",
+        "EventName": "ICACHE.ACCESSES",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0x3",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts every time the code stream enters into a new cache line by walking sequential from the previous line or being redirected by a jump and the instruction cache registers bytes are not present. -",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x80",
+        "EventName": "ICACHE.MISSES",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0x2",
+        "Unit": "cpu_atom"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json
new file mode 100644
index 000000000000..15b2294a8ae7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json
@@ -0,0 +1,185 @@
+[
+    {
+        "BriefDescription": "Counts cacheable demand data reads were not supplied by the L3 cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xB7",
+        "EventName": "OCR.DEMAND_DATA_RD.L3_MISS",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3FBFC00001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts demand reads for ownership, including SWPREFETCHW which is an RFO were not supplied by the L3 cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xB7",
+        "EventName": "OCR.DEMAND_RFO.L3_MISS",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3FBFC00002",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x80",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "1009",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x10",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "20011",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x100",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "503",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x20",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "100007",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x4",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "100003",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x200",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "101",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x40",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "2003",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.",
+        "CollectPEBSRecord": "2",
+        "Counter": "1,2,3,4,5,6,7",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8",
+        "MSRIndex": "0x3F6",
+        "MSRValue": "0x8",
+        "PEBS": "2",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "50021",
+        "TakenAlone": "1",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Retired memory store access operations. A PDist event for PEBS Store Latency Facility.",
+        "CollectPEBSRecord": "2",
+        "Data_LA": "1",
+        "EventCode": "0xcd",
+        "EventName": "MEM_TRANS_RETIRED.STORE_SAMPLE",
+        "PEBS": "2",
+        "SampleAfterValue": "1000003",
+        "UMask": "0x2",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts demand data reads that were not supplied by the L3 cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_DATA_RD.L3_MISS",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3FBFC00001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that were not supplied by the L3 cache.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_RFO.L3_MISS",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x3FBFC00002",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/other.json b/tools/perf/pmu-events/arch/x86/meteorlake/other.json
new file mode 100644
index 000000000000..14273ac54d2c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/other.json
@@ -0,0 +1,46 @@
+[
+    {
+        "BriefDescription": "Counts cacheable demand data reads Catch all value for any response types - this includes response types not define in the OCR.  If this is set all other response types will be ignored",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xB7",
+        "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x10001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts demand reads for ownership, including SWPREFETCHW which is an RFO Catch all value for any response types - this includes response types not define in the OCR.  If this is set all other response types will be ignored",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xB7",
+        "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x10002",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts demand data reads that have any type of response.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x10001",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that have any type of response.",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x2A,0x2B",
+        "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE",
+        "MSRIndex": "0x1a6,0x1a7",
+        "MSRValue": "0x10002",
+        "SampleAfterValue": "100003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
new file mode 100644
index 000000000000..0a7981675b6c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
@@ -0,0 +1,254 @@
+[
+    {
+        "BriefDescription": "Counts the total number of branch instructions retired for all branch types.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc4",
+        "EventName": "BR_INST_RETIRED.ALL_BRANCHES",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the total number of mispredicted branch instructions retired for all branch types.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc5",
+        "EventName": "BR_MISP_RETIRED.ALL_BRANCHES",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "CollectPEBSRecord": "2",
+        "Counter": "33",
+        "EventName": "CPU_CLK_UNHALTED.CORE",
+        "PEBScounters": "33",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x2",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of unhalted core clock cycles[This event is alias to CPU_CLK_UNHALTED.THREAD_P]",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.CORE_P",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "2000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted reference clock cycles",
+        "CollectPEBSRecord": "2",
+        "Counter": "34",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
+        "PEBScounters": "34",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x3",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Fixed Counter: Counts the number of unhalted core clock cycles",
+        "CollectPEBSRecord": "2",
+        "Counter": "33",
+        "EventName": "CPU_CLK_UNHALTED.THREAD",
+        "PEBScounters": "33",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x2",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of unhalted core clock cycles[This event is alias to CPU_CLK_UNHALTED.CORE_P]",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.THREAD_P",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "2000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Fixed Counter: Counts the number of instructions retired",
+        "CollectPEBSRecord": "2",
+        "Counter": "32",
+        "EventName": "INST_RETIRED.ANY",
+        "PEBS": "1",
+        "PEBScounters": "32",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x1",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of instructions retired",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc0",
+        "EventName": "INST_RETIRED.ANY_P",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "2000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x73",
+        "EventName": "TOPDOWN_BAD_SPECULATION.ALL",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of retirement slots not consumed due to backend stalls",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x74",
+        "EventName": "TOPDOWN_BE_BOUND.ALL",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of retirement slots not consumed due to front end stalls",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x71",
+        "EventName": "TOPDOWN_FE_BOUND.ALL",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Counts the number of consumed retirement slots.  Similar to UOPS_RETIRED.ALL",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x72",
+        "EventName": "TOPDOWN_RETIRING.ALL",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "1000003",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "All branch instructions retired.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc4",
+        "EventName": "BR_INST_RETIRED.ALL_BRANCHES",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "400009",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "All mispredicted branch instructions retired.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc5",
+        "EventName": "BR_MISP_RETIRED.ALL_BRANCHES",
+        "PEBS": "1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "400009",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "CollectPEBSRecord": "2",
+        "Counter": "34",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
+        "PEBScounters": "34",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x3",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC_P",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Core cycles when the thread is not in halt state",
+        "CollectPEBSRecord": "2",
+        "Counter": "33",
+        "EventName": "CPU_CLK_UNHALTED.THREAD",
+        "PEBScounters": "33",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x2",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Thread cycles when thread is not in halt state",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.THREAD_P",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "2000003",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event",
+        "CollectPEBSRecord": "2",
+        "Counter": "32",
+        "EventName": "INST_RETIRED.ANY",
+        "PEBS": "1",
+        "PEBScounters": "32",
+        "SampleAfterValue": "2000003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Number of instructions retired. General Counter - architectural event",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xc0",
+        "EventName": "INST_RETIRED.ANY_P",
+        "PEBS": "1",
+        "PEBScounters": "1,2,3,4,5,6,7",
+        "SampleAfterValue": "2000003",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Loads blocked due to overlapping with a preceding store that cannot be forwarded.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x03",
+        "EventName": "LD_BLOCKS.STORE_FORWARD",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "100003",
+        "UMask": "0x82",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event",
+        "CollectPEBSRecord": "2",
+        "Counter": "35",
+        "EventName": "TOPDOWN.SLOTS",
+        "PEBScounters": "35",
+        "SampleAfterValue": "10000003",
+        "UMask": "0x4",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "TMA slots available for an unhalted logical processor. General counter - architectural event",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0xa4",
+        "EventName": "TOPDOWN.SLOTS_P",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "10000003",
+        "UMask": "0x1",
+        "Unit": "cpu_core"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json b/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
new file mode 100644
index 000000000000..3087730cca7b
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
@@ -0,0 +1,46 @@
+[
+    {
+        "BriefDescription": "Counts the number of page walks completed due to instruction fetch misses to any page size.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x85",
+        "EventName": "ITLB_MISSES.WALK_COMPLETED",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "SampleAfterValue": "200003",
+        "UMask": "0xe",
+        "Unit": "cpu_atom"
+    },
+    {
+        "BriefDescription": "Load miss in all TLB levels causes a page walk that completes. (All page sizes)",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x12",
+        "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "100003",
+        "UMask": "0xe",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Store misses in all TLB levels causes a page walk that completes. (All page sizes)",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x13",
+        "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "100003",
+        "UMask": "0xe",
+        "Unit": "cpu_core"
+    },
+    {
+        "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (All page sizes)",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x11",
+        "EventName": "ITLB_MISSES.WALK_COMPLETED",
+        "PEBScounters": "0,1,2,3",
+        "SampleAfterValue": "100003",
+        "UMask": "0xe",
+        "Unit": "cpu_core"
+    }
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 20/31] perf vendor events: Update Intel nehalemep
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (8 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 19/31] perf vendor events: Add Intel meteorlake Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 22/31] perf vendor events: Update Intel sandybridge Ian Rogers
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the nehalemep files into perf and update mapfile.csv.

Tested on a non-nehalemep with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv         |  4 +---
 .../perf/pmu-events/arch/x86/nehalemep/cache.json  | 14 +++++++-------
 .../arch/x86/nehalemep/floating-point.json         |  2 +-
 .../pmu-events/arch/x86/nehalemep/frontend.json    |  2 +-
 .../perf/pmu-events/arch/x86/nehalemep/memory.json |  6 +++---
 .../arch/x86/nehalemep/virtual-memory.json         |  2 +-
 6 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 7d6731e81102..8336f2c8f96f 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -17,9 +17,7 @@ GenuineIntel-6-3E,v21,ivytown,core
 GenuineIntel-6-2D,v21,jaketown,core
 GenuineIntel-6-(57|85),v9,knightslanding,core
 GenuineIntel-6-AA,v1.00,meteorlake,core
-GenuineIntel-6-1E,v2,nehalemep,core
-GenuineIntel-6-1F,v2,nehalemep,core
-GenuineIntel-6-1A,v2,nehalemep,core
+GenuineIntel-6-1[AEF],v3,nehalemep,core
 GenuineIntel-6-2E,v2,nehalemex,core
 GenuineIntel-6-[4589]E,v24,skylake,core
 GenuineIntel-6-A[56],v24,skylake,core
diff --git a/tools/perf/pmu-events/arch/x86/nehalemep/cache.json b/tools/perf/pmu-events/arch/x86/nehalemep/cache.json
index bcf74d793ae2..1ee91300baf9 100644
--- a/tools/perf/pmu-events/arch/x86/nehalemep/cache.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/cache.json
@@ -1773,7 +1773,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches satisfied by the IO, CSR, MMIO unit",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches satisfied by the IO, CSR, MMIO unit",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.IO_CSR_MMIO",
@@ -1784,7 +1784,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches statisfied by the LLC and not found in a sibling core",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches satisfied by the LLC and not found in a sibling core",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.LLC_HIT_NO_OTHER_CORE",
@@ -1795,7 +1795,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches satisfied by the LLC and HIT in a sibling core",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches satisfied by the LLC and HIT in a sibling core",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.LLC_HIT_OTHER_CORE_HIT",
@@ -1806,7 +1806,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches satisfied by the LLC  and HITM in a sibling core",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches satisfied by the LLC  and HITM in a sibling core",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.LLC_HIT_OTHER_CORE_HITM",
@@ -1861,7 +1861,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches that HIT in a remote cache",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches that HIT in a remote cache",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.REMOTE_CACHE_HIT",
@@ -1872,7 +1872,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches that HITM in a remote cache",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches that HITM in a remote cache",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.REMOTE_CACHE_HITM",
@@ -3226,4 +3226,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/nehalemep/floating-point.json b/tools/perf/pmu-events/arch/x86/nehalemep/floating-point.json
index 39af1329224a..666e466d351c 100644
--- a/tools/perf/pmu-events/arch/x86/nehalemep/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/floating-point.json
@@ -226,4 +226,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/nehalemep/frontend.json b/tools/perf/pmu-events/arch/x86/nehalemep/frontend.json
index 8ac5c24888c5..c561ac24d91d 100644
--- a/tools/perf/pmu-events/arch/x86/nehalemep/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/frontend.json
@@ -23,4 +23,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/nehalemep/memory.json b/tools/perf/pmu-events/arch/x86/nehalemep/memory.json
index 26138ae639f4..6e95de3f3409 100644
--- a/tools/perf/pmu-events/arch/x86/nehalemep/memory.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/memory.json
@@ -286,7 +286,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches statisfied by the local DRAM.",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches satisfied by the local DRAM.",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.LOCAL_DRAM",
@@ -297,7 +297,7 @@
         "UMask": "0x1"
     },
     {
-        "BriefDescription": "Offcore data reads, RFO's and prefetches statisfied by the remote DRAM",
+        "BriefDescription": "Offcore data reads, RFOs, and prefetches satisfied by the remote DRAM",
         "Counter": "2",
         "EventCode": "0xB7",
         "EventName": "OFFCORE_RESPONSE.DATA_IN.REMOTE_DRAM",
@@ -736,4 +736,4 @@
         "SampleAfterValue": "100000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/nehalemep/virtual-memory.json b/tools/perf/pmu-events/arch/x86/nehalemep/virtual-memory.json
index 6d3247c55bcd..e88c0802e679 100644
--- a/tools/perf/pmu-events/arch/x86/nehalemep/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/nehalemep/virtual-memory.json
@@ -106,4 +106,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 22/31] perf vendor events: Update Intel sandybridge
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (9 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 20/31] perf vendor events: Update Intel nehalemep Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 24/31] perf vendor events: Update Intel silvermont Ian Rogers
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the sandybridge files into perf and update mapfile.csv.

Tested on a non-sandybridge with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv            |  2 +-
 tools/perf/pmu-events/arch/x86/sandybridge/cache.json |  2 +-
 .../arch/x86/sandybridge/floating-point.json          |  2 +-
 .../pmu-events/arch/x86/sandybridge/frontend.json     |  4 ++--
 .../perf/pmu-events/arch/x86/sandybridge/memory.json  |  2 +-
 tools/perf/pmu-events/arch/x86/sandybridge/other.json |  2 +-
 .../pmu-events/arch/x86/sandybridge/pipeline.json     | 10 +++++-----
 .../pmu-events/arch/x86/sandybridge/snb-metrics.json  | 11 +++++++++--
 .../pmu-events/arch/x86/sandybridge/uncore-other.json |  2 +-
 .../arch/x86/sandybridge/virtual-memory.json          |  2 +-
 10 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index ab070ba3ad48..eae103022077 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -19,12 +19,12 @@ GenuineIntel-6-(57|85),v9,knightslanding,core
 GenuineIntel-6-AA,v1.00,meteorlake,core
 GenuineIntel-6-1[AEF],v3,nehalemep,core
 GenuineIntel-6-2E,v3,nehalemex,core
+GenuineIntel-6-2A,v17,sandybridge,core
 GenuineIntel-6-[4589]E,v24,skylake,core
 GenuineIntel-6-A[56],v24,skylake,core
 GenuineIntel-6-37,v13,silvermont,core
 GenuineIntel-6-4D,v13,silvermont,core
 GenuineIntel-6-4C,v13,silvermont,core
-GenuineIntel-6-2A,v15,sandybridge,core
 GenuineIntel-6-2C,v2,westmereep-dp,core
 GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/cache.json b/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
index 92a7269eb444..a1d622352131 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/cache.json
@@ -1876,4 +1876,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x10"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json b/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
index 713878fd062b..eb2ff2cfdf6b 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/floating-point.json
@@ -135,4 +135,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json b/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
index fa22f9463b66..e2c82e43a2de 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/frontend.json
@@ -176,7 +176,7 @@
         "CounterMask": "1",
         "EventCode": "0x79",
         "EventName": "IDQ.MS_CYCLES",
-        "PublicDescription": "This event counts cycles during which the microcode sequencer assisted the front-end in delivering uops.  Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder.  Using other instructions, if possible, will usually improve performance.  See the Intel 64 and IA-32 Architectures Optimization Reference Manual for more information.",
+        "PublicDescription": "This event counts cycles during which the microcode sequencer assisted the front-end in delivering uops.  Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder.  Using other instructions, if possible, will usually improve performance.  See the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual for more information.",
         "SampleAfterValue": "2000003",
         "UMask": "0x30"
     },
@@ -311,4 +311,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/memory.json b/tools/perf/pmu-events/arch/x86/sandybridge/memory.json
index 931892d34076..3c283ca309f3 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/memory.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/memory.json
@@ -442,4 +442,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/other.json b/tools/perf/pmu-events/arch/x86/sandybridge/other.json
index e251f535ec09..2f873ab14156 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/other.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/other.json
@@ -55,4 +55,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
index b9a3f194a00a..2c3b6c92aa6b 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json
@@ -609,7 +609,7 @@
         "UMask": "0x3"
     },
     {
-        "BriefDescription": "Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...).",
+        "BriefDescription": "Number of occurrences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...).",
         "Counter": "0,1,2,3",
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "CounterMask": "1",
@@ -652,7 +652,7 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0x03",
         "EventName": "LD_BLOCKS.STORE_FORWARD",
-        "PublicDescription": "This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load.  The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceeding smaller uncompleted store.  See the table of not supported store forwards in the Intel 64 and IA-32 Architectures Optimization Reference Manual.  The penalty for blocked store forwarding is that the load must wait for the store to complete before it can be issued.",
+        "PublicDescription": "This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load.  The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceeding smaller uncompleted store.  See the table of not supported store forwards in the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual.  The penalty for blocked store forwarding is that the load must wait for the store to complete before it can be issued.",
         "SampleAfterValue": "100003",
         "UMask": "0x2"
     },
@@ -778,7 +778,7 @@
         "CounterMask": "1",
         "EventCode": "0x59",
         "EventName": "PARTIAL_RAT_STALLS.FLAGS_MERGE_UOP_CYCLES",
-        "PublicDescription": "This event counts the number of cycles spent executing performance-sensitive flags-merging uops. For example, shift CL (merge_arith_flags). For more details, See the Intel 64 and IA-32 Architectures Optimization Reference Manual.",
+        "PublicDescription": "This event counts the number of cycles spent executing performance-sensitive flags-merging uops. For example, shift CL (merge_arith_flags). For more details, See the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual.",
         "SampleAfterValue": "2000003",
         "UMask": "0x20"
     },
@@ -797,7 +797,7 @@
         "CounterHTOff": "0,1,2,3,4,5,6,7",
         "EventCode": "0x59",
         "EventName": "PARTIAL_RAT_STALLS.SLOW_LEA_WINDOW",
-        "PublicDescription": "This event counts the number of cycles with at least one slow LEA uop being allocated. A uop is generally considered as slow LEA if it has three sources (for example, two sources and immediate) regardless of whether it is a result of LEA instruction or not. Examples of the slow LEA uop are or uops with base, index, and offset source operands using base and index reqisters, where base is EBR/RBP/R13, using RIP relative or 16-bit addressing modes. See the Intel 64 and IA-32 Architectures Optimization Reference Manual for more details about slow LEA instructions.",
+        "PublicDescription": "This event counts the number of cycles with at least one slow LEA uop being allocated. A uop is generally considered as slow LEA if it has three sources (for example, two sources and immediate) regardless of whether it is a result of LEA instruction or not. Examples of the slow LEA uop are or uops with base, index, and offset source operands using base and index reqisters, where base is EBR/RBP/R13, using RIP relative or 16-bit addressing modes. See the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual for more details about slow LEA instructions.",
         "SampleAfterValue": "2000003",
         "UMask": "0x40"
     },
@@ -1209,4 +1209,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
index c8e7050d9c26..ae7ed267b2a2 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
@@ -124,7 +124,7 @@
         "MetricName": "FLOPc_SMT"
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "UOPS_DISPATCHED.THREAD / (( cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else cpu@UOPS_DISPATCHED.CORE\\,cmask\\=1@)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "ILP"
@@ -141,6 +141,12 @@
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
+        "MetricGroup": "Pipeline;Ret",
+        "MetricName": "Retire"
+    },
     {
         "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
         "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
@@ -163,7 +169,8 @@
         "BriefDescription": "Giga Floating Point Operations Per Second",
         "MetricExpr": "( ( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * SIMD_FP_256.PACKED_SINGLE ) / 1000000000 ) / duration_time",
         "MetricGroup": "Cor;Flops;HPC",
-        "MetricName": "GFLOPs"
+        "MetricName": "GFLOPs",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/uncore-other.json b/tools/perf/pmu-events/arch/x86/sandybridge/uncore-other.json
index 6278068908cf..88f1e326205f 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/uncore-other.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/uncore-other.json
@@ -82,10 +82,10 @@
     {
         "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles.",
         "Counter": "Fixed",
+        "EventCode": "0xff",
         "EventName": "UNC_CLOCK.SOCKET",
         "PerPkg": "1",
         "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
-        "UMask": "0x01",
         "Unit": "ARB"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/virtual-memory.json b/tools/perf/pmu-events/arch/x86/sandybridge/virtual-memory.json
index 4dd136d00a10..98362abba1a7 100644
--- a/tools/perf/pmu-events/arch/x86/sandybridge/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/sandybridge/virtual-memory.json
@@ -146,4 +146,4 @@
         "SampleAfterValue": "100007",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 24/31] perf vendor events: Update Intel silvermont
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (10 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 22/31] perf vendor events: Update Intel sandybridge Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 25/31] perf vendor events: Update Intel skylake Ian Rogers
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually
copy the silvermont files into perf and update mapfile.csv. Other
than aligning whitespace this change just folds the mapfile.csv
entries for silvertmont onto one line.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv                    | 4 +---
 tools/perf/pmu-events/arch/x86/silvermont/cache.json          | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/floating-point.json | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/frontend.json       | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/memory.json         | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/other.json          | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/pipeline.json       | 2 +-
 tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json | 2 +-
 8 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 0ed0c1ad122b..d224d59e3c72 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -21,11 +21,9 @@ GenuineIntel-6-1[AEF],v3,nehalemep,core
 GenuineIntel-6-2E,v3,nehalemex,core
 GenuineIntel-6-2A,v17,sandybridge,core
 GenuineIntel-6-8F,v1.04,sapphirerapids,core
+GenuineIntel-6-(37|4C|4D),v14,silvermont,core
 GenuineIntel-6-[4589]E,v24,skylake,core
 GenuineIntel-6-A[56],v24,skylake,core
-GenuineIntel-6-37,v13,silvermont,core
-GenuineIntel-6-4D,v13,silvermont,core
-GenuineIntel-6-4C,v13,silvermont,core
 GenuineIntel-6-2C,v2,westmereep-dp,core
 GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/cache.json b/tools/perf/pmu-events/arch/x86/silvermont/cache.json
index e16e1d910e4a..7959504dff29 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/cache.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/cache.json
@@ -807,4 +807,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x4"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/floating-point.json b/tools/perf/pmu-events/arch/x86/silvermont/floating-point.json
index 1d75b35694ac..aa4faf110512 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/floating-point.json
@@ -8,4 +8,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x4"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/frontend.json b/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
index a4c98e43f677..43e5e48f7212 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/frontend.json
@@ -71,4 +71,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/memory.json b/tools/perf/pmu-events/arch/x86/silvermont/memory.json
index 5e21fc3fd078..0f5fba43da4c 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/memory.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/memory.json
@@ -8,4 +8,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/other.json b/tools/perf/pmu-events/arch/x86/silvermont/other.json
index 16d16a1ce6de..4db59d84c144 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/other.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/other.json
@@ -17,4 +17,4 @@
         "SampleAfterValue": "200003",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
index 03a4c7f26698..e42a37eabc17 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json
@@ -313,4 +313,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json b/tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json
index f4b8a1ef48f6..b50cee3a5e4c 100644
--- a/tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/silvermont/virtual-memory.json
@@ -66,4 +66,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x3"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 25/31] perf vendor events: Update Intel skylake
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (11 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 24/31] perf vendor events: Update Intel silvermont Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 28/31] perf vendor events: Update Intel tigerlake Ian Rogers
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the skylake files into perf and update mapfile.csv.

Tested on a non-skylake with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   3 +-
 .../arch/x86/skylake/floating-point.json      |   2 +-
 .../pmu-events/arch/x86/skylake/frontend.json |   2 +-
 .../pmu-events/arch/x86/skylake/other.json    |   2 +-
 .../arch/x86/skylake/skl-metrics.json         | 178 ++++++++----
 .../arch/x86/skylake/uncore-cache.json        | 142 ++++++++++
 .../arch/x86/skylake/uncore-other.json        |  79 ++++++
 .../pmu-events/arch/x86/skylake/uncore.json   | 254 ------------------
 .../arch/x86/skylake/virtual-memory.json      |   2 +-
 9 files changed, 345 insertions(+), 319 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
 delete mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore.json

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index d224d59e3c72..4a38edc7c270 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -22,8 +22,7 @@ GenuineIntel-6-2E,v3,nehalemex,core
 GenuineIntel-6-2A,v17,sandybridge,core
 GenuineIntel-6-8F,v1.04,sapphirerapids,core
 GenuineIntel-6-(37|4C|4D),v14,silvermont,core
-GenuineIntel-6-[4589]E,v24,skylake,core
-GenuineIntel-6-A[56],v24,skylake,core
+GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
 GenuineIntel-6-2C,v2,westmereep-dp,core
 GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
diff --git a/tools/perf/pmu-events/arch/x86/skylake/floating-point.json b/tools/perf/pmu-events/arch/x86/skylake/floating-point.json
index 73cfb2a39722..d6cee5ae4402 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/floating-point.json
@@ -70,4 +70,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x1e"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/skylake/frontend.json b/tools/perf/pmu-events/arch/x86/skylake/frontend.json
index ecce4273ae52..8633ee406813 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/frontend.json
@@ -527,4 +527,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/skylake/other.json b/tools/perf/pmu-events/arch/x86/skylake/other.json
index 4f4839024915..8f4bc8892c47 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/other.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/other.json
@@ -17,4 +17,4 @@
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
index defbca9a6038..73fa72d3dcb1 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
@@ -95,13 +95,13 @@
     {
         "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
         "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) * ( ( (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) / ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) ) * ( (min( 9 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE , max( CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS , 0 ) ) / CPU_CLK_UNHALTED.THREAD) / (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) ) + ( (EXE_ACTIVITY.BOUND_ON_STORES / CPU_CLK_UNHALTED.THREAD) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) ) * ( (( 9 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_STORE_MISSES.WALK_ACTIVE ) / CPU_CLK_UNHALTED.THREAD) / #(EXE_ACTIVITY.BOUND_ON_STORES / CPU_CLK_UNHALTED.THREAD) ) ) ",
-        "MetricGroup": "Mem;MemoryTLB",
+        "MetricGroup": "Mem;MemoryTLB;Offcore",
         "MetricName": "Memory_Data_TLBs"
     },
     {
         "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
         "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * ( ( (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) / ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * ( (min( 9 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE , max( CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS , 0 ) ) / CPU_CLK_UNHALTED.THREAD) / (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) ) + ( (EXE_ACTIVITY.BOUND_ON_STORES / CPU_CLK_UNHALTED.THREAD) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * ( (( 9 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_STORE_MISSES.WALK_ACTIVE ) / ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) / #(EXE_ACTIVITY.BOUND_ON_STORES / CPU_CLK_UNHALTED.THREAD) ) ) ",
-        "MetricGroup": "Mem;MemoryTLB;_SMT",
+        "MetricGroup": "Mem;MemoryTLB;Offcore_SMT",
         "MetricName": "Memory_Data_TLBs_SMT"
     },
     {
@@ -214,42 +214,36 @@
         "MetricName": "FLOPc_SMT"
     },
     {
-        "BriefDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width)",
+        "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)",
         "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 * CPU_CLK_UNHALTED.THREAD )",
         "MetricGroup": "Cor;Flops;HPC",
         "MetricName": "FP_Arith_Utilization",
-        "PublicDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width). Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
+        "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)."
     },
     {
-        "BriefDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width). SMT version; use when SMT is enabled and measuring per logical CPU.",
+        "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). SMT version; use when SMT is enabled and measuring per logical CPU.",
         "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) )",
         "MetricGroup": "Cor;Flops;HPC_SMT",
         "MetricName": "FP_Arith_Utilization_SMT",
-        "PublicDescription": "Actual per-core usage of the Floating Point execution units (regardless of the vector width). Values > 1 are possible due to Fused-Multiply Add (FMA) counting. SMT version; use when SMT is enabled and measuring per logical CPU."
+        "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common). SMT version; use when SMT is enabled and measuring per logical CPU."
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
         "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "ILP"
     },
     {
-        "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
-        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BrMispredicts",
-        "MetricName": "Branch_Misprediction_Cost"
-    },
-    {
-        "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
-        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BrMispredicts_SMT",
-        "MetricName": "Branch_Misprediction_Cost_SMT"
+        "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+        "MetricExpr": "( 1 - ((1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD)) - ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD)))) / ((EXE_ACTIVITY.EXE_BOUND_0_PORTS + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL)) / CPU_CLK_UNHALTED.THREAD if ( ARITH.DIVIDER_ACTIVE < ( CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY ) ) else (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) / CPU_CLK_UNHALTED.THREAD) if ((1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD)) - ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD)))) < ((EXE_ACTIVITY.EXE_BOUND_0_PORTS + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL)) / CPU_CLK_UNHALTED.THREAD if ( ARITH.DIVIDER_ACTIVE < ( CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY ) ) else (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) / CPU_CLK_UNHALTED.THREAD) else 1 ) if 0 > 0.5 else 0",
+        "MetricGroup": "Cor;SMT",
+        "MetricName": "Core_Bound_Likely"
     },
     {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Bad;BadSpec;BrMispredicts",
-        "MetricName": "IpMispredict"
+        "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
+        "MetricExpr": "( 1 - ((1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) / ((EXE_ACTIVITY.EXE_BOUND_0_PORTS + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL)) / CPU_CLK_UNHALTED.THREAD if ( ARITH.DIVIDER_ACTIVE < ( CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY ) ) else (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) / CPU_CLK_UNHALTED.THREAD) if ((1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) < ((EXE_ACTIVITY.EXE_BOUND_0_PORTS + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL)) / CPU_CLK_UNHALTED.THREAD if ( ARITH.DIVIDER_ACTIVE < ( CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY ) ) else (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) / CPU_CLK_UNHALTED.THREAD) else 1 ) if (1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_UNHALTED.REF_XCLK_ANY / 2 )) > 0.5 else 0",
+        "MetricGroup": "Cor;SMT_SMT",
+        "MetricName": "Core_Bound_Likely_SMT"
     },
     {
         "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
@@ -334,12 +328,30 @@
         "MetricName": "IpArith_AVX256",
         "PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
     },
+    {
+        "BriefDescription": "Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / cpu@SW_PREFETCH_ACCESS.T0\\,umask\\=0xF@",
+        "MetricGroup": "Prefetches",
+        "MetricName": "IpSWPF"
+    },
     {
         "BriefDescription": "Total number of retired Instructions, Sample with: INST_RETIRED.PREC_DIST",
         "MetricExpr": "INST_RETIRED.ANY",
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
+        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
+        "MetricGroup": "Pipeline;Ret",
+        "MetricName": "Retire"
+    },
+    {
+        "BriefDescription": "",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
+        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
+        "MetricName": "Execute"
+    },
     {
         "BriefDescription": "Average number of Uops issued by front-end when it issued something",
         "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=1@",
@@ -353,23 +365,47 @@
         "MetricName": "DSB_Coverage"
     },
     {
-        "BriefDescription": "Total penalty related to DSB (uop cache) misses - subset/see of/the Instruction_Fetch_BW Bottleneck.",
-        "MetricExpr": "(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) * (DSB2MITE_SWITCHES.PENALTY_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) + ((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD))) * (( IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS ) / CPU_CLK_UNHALTED.THREAD / 2) / #((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)))",
+        "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.",
+        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / DSB2MITE_SWITCHES.COUNT",
+        "MetricGroup": "DSBmiss",
+        "MetricName": "DSB_Switch_Cost"
+    },
+    {
+        "BriefDescription": "Total penalty related to DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck.",
+        "MetricExpr": "100 * ( (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) * (DSB2MITE_SWITCHES.PENALTY_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) + ((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD))) * (( IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS ) / CPU_CLK_UNHALTED.THREAD / 2) / #((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD))) )",
         "MetricGroup": "DSBmiss;Fed",
-        "MetricName": "DSB_Misses_Cost"
+        "MetricName": "DSB_Misses"
     },
     {
-        "BriefDescription": "Total penalty related to DSB (uop cache) misses - subset/see of/the Instruction_Fetch_BW Bottleneck.",
-        "MetricExpr": "(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * (DSB2MITE_SWITCHES.PENALTY_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) + ((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (( IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) / 2) / #((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))",
+        "BriefDescription": "Total penalty related to DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck.",
+        "MetricExpr": "100 * ( (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * (DSB2MITE_SWITCHES.PENALTY_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) + ((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) * (( IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) / 2) / #((IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )",
         "MetricGroup": "DSBmiss;Fed_SMT",
-        "MetricName": "DSB_Misses_Cost_SMT"
+        "MetricName": "DSB_Misses_SMT"
     },
     {
-        "BriefDescription": "Number of Instructions per non-speculative DSB miss",
+        "BriefDescription": "Number of Instructions per non-speculative DSB miss (lower number means higher occurrence rate)",
         "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS",
         "MetricGroup": "DSBmiss;Fed",
         "MetricName": "IpDSB_Miss_Ret"
     },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BadSpec;BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
+    {
+        "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BrMispredicts",
+        "MetricName": "Branch_Misprediction_Cost"
+    },
+    {
+        "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
+        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BrMispredicts_SMT",
+        "MetricName": "Branch_Misprediction_Cost_SMT"
+    },
     {
         "BriefDescription": "Fraction of branches that are non-taken conditionals",
         "MetricExpr": "BR_INST_RETIRED.NOT_TAKEN / BR_INST_RETIRED.ALL_BRANCHES",
@@ -395,11 +431,10 @@
         "MetricName": "Jump"
     },
     {
-        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles)",
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
         "MetricGroup": "Mem;MemoryBound;MemoryLat",
-        "MetricName": "Load_Miss_Real_Latency",
-        "PublicDescription": "Actual Average Latency for L1 data-cache miss demand load instructions (in core cycles). Latency may be overestimated for multi-load instructions - e.g. repeat strings."
+        "MetricName": "Load_Miss_Real_Latency"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
@@ -407,30 +442,6 @@
         "MetricGroup": "Mem;MemoryBound;MemoryBW",
         "MetricName": "MLP"
     },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
-        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L1D_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
-        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L2_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW",
-        "MetricName": "L3_Cache_Fill_BW"
-    },
-    {
-        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
-        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
-        "MetricGroup": "Mem;MemoryBW;Offcore",
-        "MetricName": "L3_Cache_Access_BW"
-    },
     {
         "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
         "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
@@ -450,13 +461,13 @@
         "MetricName": "L2MPKI"
     },
     {
-        "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)",
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)",
         "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses;Offcore",
         "MetricName": "L2MPKI_All"
     },
     {
-        "BriefDescription": "L2 cache misses per kilo instruction for all demand loads  (including speculative)",
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all demand loads  (including speculative)",
         "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses",
         "MetricName": "L2MPKI_Load"
@@ -480,7 +491,7 @@
         "MetricName": "L3MPKI"
     },
     {
-        "BriefDescription": "Fill Buffer (FB) true hits per kilo instructions for retired demand loads",
+        "BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
         "MetricExpr": "1000 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY",
         "MetricGroup": "Mem;CacheMisses",
         "MetricName": "FB_HPKI"
@@ -498,6 +509,54 @@
         "MetricGroup": "Mem;MemoryTLB_SMT",
         "MetricName": "Page_Walks_Utilization_SMT"
     },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW_1T"
+    },
     {
         "BriefDescription": "Average CPU Utilization",
         "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
@@ -514,7 +573,8 @@
         "BriefDescription": "Giga Floating Point Operations Per Second",
         "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / 1000000000 ) / duration_time",
         "MetricGroup": "Cor;Flops;HPC",
-        "MetricName": "GFLOPs"
+        "MetricName": "GFLOPs",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
diff --git a/tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json b/tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
new file mode 100644
index 000000000000..edb1014bee0f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
@@ -0,0 +1,142 @@
+[
+    {
+        "BriefDescription": "L3 Lookup any request that access cache and found line in E or S-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_ES",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup any request that access cache and found line in E or S-state.",
+        "UMask": "0x86",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup any request that access cache and found line in I-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_I",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup any request that access cache and found line in I-state.",
+        "UMask": "0x88",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup any request that access cache and found line in M-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_M",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup any request that access cache and found line in M-state.",
+        "UMask": "0x81",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup any request that access cache and found line in MESI-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_MESI",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup any request that access cache and found line in MESI-state.",
+        "UMask": "0x8f",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup read request that access cache and found line in E or S-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.READ_ES",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup read request that access cache and found line in E or S-state.",
+        "UMask": "0x16",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup read request that access cache and found line in I-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.READ_I",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup read request that access cache and found line in I-state.",
+        "UMask": "0x18",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup read request that access cache and found line in any MESI-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.READ_MESI",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup read request that access cache and found line in any MESI-state.",
+        "UMask": "0x1f",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup write request that access cache and found line in E or S-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.WRITE_ES",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup write request that access cache and found line in E or S-state.",
+        "UMask": "0x26",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup write request that access cache and found line in M-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.WRITE_M",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup write request that access cache and found line in M-state.",
+        "UMask": "0x21",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "L3 Lookup write request that access cache and found line in MESI-state",
+        "Counter": "0,1",
+        "EventCode": "0x34",
+        "EventName": "UNC_CBO_CACHE_LOOKUP.WRITE_MESI",
+        "PerPkg": "1",
+        "PublicDescription": "L3 Lookup write request that access cache and found line in MESI-state.",
+        "UMask": "0x2f",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a modified line in some processor core.",
+        "Counter": "0,1",
+        "EventCode": "0x22",
+        "EventName": "UNC_CBO_XSNP_RESPONSE.HITM_XCORE",
+        "PerPkg": "1",
+        "PublicDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a modified line in some processor core.",
+        "UMask": "0x48",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a non-modified line in some processor core.",
+        "Counter": "0,1",
+        "EventCode": "0x22",
+        "EventName": "UNC_CBO_XSNP_RESPONSE.HIT_XCORE",
+        "PerPkg": "1",
+        "PublicDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a non-modified line in some processor core.",
+        "UMask": "0x44",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "A cross-core snoop resulted from L3 Eviction which misses in some processor core.",
+        "Counter": "0,1",
+        "EventCode": "0x22",
+        "EventName": "UNC_CBO_XSNP_RESPONSE.MISS_EVICTION",
+        "PerPkg": "1",
+        "PublicDescription": "A cross-core snoop resulted from L3 Eviction which misses in some processor core.",
+        "UMask": "0x81",
+        "Unit": "CBO"
+    },
+    {
+        "BriefDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which misses in some processor core.",
+        "Counter": "0,1",
+        "EventCode": "0x22",
+        "EventName": "UNC_CBO_XSNP_RESPONSE.MISS_XCORE",
+        "PerPkg": "1",
+        "PublicDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which misses in some processor core.",
+        "UMask": "0x41",
+        "Unit": "CBO"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/skylake/uncore-other.json b/tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
new file mode 100644
index 000000000000..bf5d4acdd6b8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
@@ -0,0 +1,79 @@
+[
+    {
+        "BriefDescription": "Number of entries allocated. Account for Any type: e.g. Snoop, Core aperture, etc.",
+        "Counter": "0,1",
+        "EventCode": "0x84",
+        "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL",
+        "PerPkg": "1",
+        "PublicDescription": "Number of entries allocated. Account for Any type: e.g. Snoop, Core aperture, etc.",
+        "UMask": "0x01",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Number of all Core entries outstanding for the memory controller. The outstanding interval starts after LLC miss till return of first data chunk. Accounts for Coherent and non-coherent traffic.",
+        "EventCode": "0x80",
+        "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL",
+        "PerPkg": "1",
+        "PublicDescription": "Number of all Core entries outstanding for the memory controller. The outstanding interval starts after LLC miss till return of first data chunk. Accounts for Coherent and non-coherent traffic.",
+        "UMask": "0x01",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Cycles with at least one request outstanding is waiting for data return from memory controller. Account for coherent and non-coherent requests initiated by IA Cores, Processor Graphics Unit, or LLC.",
+        "CounterMask": "1",
+        "EventCode": "0x80",
+        "EventName": "UNC_ARB_TRK_OCCUPANCY.CYCLES_WITH_ANY_REQUEST",
+        "PerPkg": "1",
+        "PublicDescription": "Cycles with at least one request outstanding is waiting for data return from memory controller. Account for coherent and non-coherent requests initiated by IA Cores, Processor Graphics Unit, or LLC.",
+        "UMask": "0x01",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Number of Core Data Read entries outstanding for the memory controller. The outstanding interval starts after LLC miss till return of first data chunk.",
+        "EventCode": "0x80",
+        "EventName": "UNC_ARB_TRK_OCCUPANCY.DATA_READ",
+        "PerPkg": "1",
+        "PublicDescription": "Number of Core Data Read entries outstanding for the memory controller. The outstanding interval starts after LLC miss till return of first data chunk.",
+        "UMask": "0x02",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Number of Core coherent Data Read requests sent to memory controller whose data is returned directly to requesting agent.",
+        "Counter": "0,1",
+        "EventCode": "0x81",
+        "EventName": "UNC_ARB_TRK_REQUESTS.DATA_READ",
+        "PerPkg": "1",
+        "PublicDescription": "Number of Core coherent Data Read requests sent to memory controller whose data is returned directly to requesting agent.",
+        "UMask": "0x02",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Number of Core coherent Data Read requests sent to memory controller whose data is returned directly to requesting agent.",
+        "Counter": "0,1",
+        "EventCode": "0x81",
+        "EventName": "UNC_ARB_TRK_REQUESTS.DRD_DIRECT",
+        "PerPkg": "1",
+        "PublicDescription": "Number of Core coherent Data Read requests sent to memory controller whose data is returned directly to requesting agent.",
+        "UMask": "0x02",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Number of Writes allocated - any write transactions: full/partials writes and evictions.",
+        "Counter": "0,1",
+        "EventCode": "0x81",
+        "EventName": "UNC_ARB_TRK_REQUESTS.WRITES",
+        "PerPkg": "1",
+        "PublicDescription": "Number of Writes allocated - any write transactions: full/partials writes and evictions.",
+        "UMask": "0x20",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles",
+        "Counter": "FIXED",
+        "EventCode": "0xff",
+        "EventName": "UNC_CLOCK.SOCKET",
+        "PerPkg": "1",
+        "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
+        "Unit": "CLOCK"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/skylake/uncore.json b/tools/perf/pmu-events/arch/x86/skylake/uncore.json
deleted file mode 100644
index dbc193252fb3..000000000000
--- a/tools/perf/pmu-events/arch/x86/skylake/uncore.json
+++ /dev/null
@@ -1,254 +0,0 @@
-[
-  {
-    "Unit": "CBO",
-    "EventCode": "0x22",
-    "UMask": "0x41",
-    "EventName": "UNC_CBO_XSNP_RESPONSE.MISS_XCORE",
-    "BriefDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which misses in some processor core.",
-    "PublicDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which misses in some processor core.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x22",
-    "UMask": "0x81",
-    "EventName": "UNC_CBO_XSNP_RESPONSE.MISS_EVICTION",
-    "BriefDescription": "A cross-core snoop resulted from L3 Eviction which misses in some processor core.",
-    "PublicDescription": "A cross-core snoop resulted from L3 Eviction which misses in some processor core.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x22",
-    "UMask": "0x44",
-    "EventName": "UNC_CBO_XSNP_RESPONSE.HIT_XCORE",
-    "BriefDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a non-modified line in some processor core.",
-    "PublicDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a non-modified line in some processor core.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x22",
-    "UMask": "0x48",
-    "EventName": "UNC_CBO_XSNP_RESPONSE.HITM_XCORE",
-    "BriefDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a modified line in some processor core.",
-    "PublicDescription": "A cross-core snoop initiated by this Cbox due to processor core memory request which hits a modified line in some processor core.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x21",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.WRITE_M",
-    "BriefDescription": "L3 Lookup write request that access cache and found line in M-state",
-    "PublicDescription": "L3 Lookup write request that access cache and found line in M-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x81",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_M",
-    "BriefDescription": "L3 Lookup any request that access cache and found line in M-state",
-    "PublicDescription": "L3 Lookup any request that access cache and found line in M-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x18",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.READ_I",
-    "BriefDescription": "L3 Lookup read request that access cache and found line in I-state",
-    "PublicDescription": "L3 Lookup read request that access cache and found line in I-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x88",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_I",
-    "BriefDescription": "L3 Lookup any request that access cache and found line in I-state",
-    "PublicDescription": "L3 Lookup any request that access cache and found line in I-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x1f",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.READ_MESI",
-    "BriefDescription": "L3 Lookup read request that access cache and found line in any MESI-state",
-    "PublicDescription": "L3 Lookup read request that access cache and found line in any MESI-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x2f",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.WRITE_MESI",
-    "BriefDescription": "L3 Lookup write request that access cache and found line in MESI-state",
-    "PublicDescription": "L3 Lookup write request that access cache and found line in MESI-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x8f",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_MESI",
-    "BriefDescription": "L3 Lookup any request that access cache and found line in MESI-state",
-    "PublicDescription": "L3 Lookup any request that access cache and found line in MESI-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x86",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.ANY_ES",
-    "BriefDescription": "L3 Lookup any request that access cache and found line in E or S-state",
-    "PublicDescription": "L3 Lookup any request that access cache and found line in E or S-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x16",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.READ_ES",
-    "BriefDescription": "L3 Lookup read request that access cache and found line in E or S-state",
-    "PublicDescription": "L3 Lookup read request that access cache and found line in E or S-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "CBO",
-    "EventCode": "0x34",
-    "UMask": "0x26",
-    "EventName": "UNC_CBO_CACHE_LOOKUP.WRITE_ES",
-    "BriefDescription": "L3 Lookup write request that access cache and found line in E or S-state",
-    "PublicDescription": "L3 Lookup write request that access cache and found line in E or S-state.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "iMPH-U",
-    "EventCode": "0x80",
-    "UMask": "0x01",
-    "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL",
-    "BriefDescription": "Each cycle count number of all Core outgoing valid entries. Such entry is defined as valid from its allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.",
-    "PublicDescription": "Each cycle count number of all Core outgoing valid entries. Such entry is defined as valid from its allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.",
-    "Counter": "0",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "iMPH-U",
-    "EventCode": "0x81",
-    "UMask": "0x01",
-    "EventName": "UNC_ARB_TRK_REQUESTS.ALL",
-    "BriefDescription": "Total number of Core outgoing entries allocated. Accounts for Coherent and non-coherent traffic.",
-    "PublicDescription": "Total number of Core outgoing entries allocated. Accounts for Coherent and non-coherent traffic.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "iMPH-U",
-    "EventCode": "0x81",
-    "UMask": "0x02",
-    "EventName": "UNC_ARB_TRK_REQUESTS.DRD_DIRECT",
-    "BriefDescription": "Number of Core coherent Data Read entries allocated in DirectData mode",
-    "PublicDescription": "Number of Core coherent Data Read entries allocated in DirectData mode.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "iMPH-U",
-    "EventCode": "0x81",
-    "UMask": "0x20",
-    "EventName": "UNC_ARB_TRK_REQUESTS.WRITES",
-    "BriefDescription": "Number of Writes allocated - any write transactions: full/partials writes and evictions.",
-    "PublicDescription": "Number of Writes allocated - any write transactions: full/partials writes and evictions.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "iMPH-U",
-    "EventCode": "0x84",
-    "UMask": "0x01",
-    "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL",
-    "BriefDescription": "Number of entries allocated. Account for Any type: e.g. Snoop, Core aperture, etc.",
-    "PublicDescription": "Number of entries allocated. Account for Any type: e.g. Snoop, Core aperture, etc.",
-    "Counter": "0,1",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "iMPH-U",
-    "EventCode": "0x80",
-    "UMask": "0x01",
-    "EventName": "UNC_ARB_TRK_OCCUPANCY.CYCLES_WITH_ANY_REQUEST",
-    "BriefDescription": "Cycles with at least one request outstanding is waiting for data return from memory controller. Account for coherent and non-coherent requests initiated by IA Cores, Processor Graphics Unit, or LLC.;",
-    "PublicDescription": "Cycles with at least one request outstanding is waiting for data return from memory controller. Account for coherent and non-coherent requests initiated by IA Cores, Processor Graphics Unit, or LLC.",
-    "Counter": "0",
-    "CounterMask": "1",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  },
-  {
-    "Unit": "NCU",
-    "EventCode": "0x0",
-    "UMask": "0x01",
-    "EventName": "UNC_CLOCK.SOCKET",
-    "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles",
-    "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
-    "Counter": "FIXED",
-    "CounterMask": "0",
-    "Invert": "0",
-    "EdgeDetect": "0"
-  }
-]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/skylake/virtual-memory.json b/tools/perf/pmu-events/arch/x86/skylake/virtual-memory.json
index 792ca39f013a..dd334b416c57 100644
--- a/tools/perf/pmu-events/arch/x86/skylake/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/skylake/virtual-memory.json
@@ -281,4 +281,4 @@
         "SampleAfterValue": "100007",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 28/31] perf vendor events: Update Intel tigerlake
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (12 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 25/31] perf vendor events: Update Intel skylake Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 29/31] perf vendor events: Update Intel westmereep-dp Ian Rogers
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the tigerlake files into perf and update mapfile.csv.

Tested on a non-tigerlake with 'perf test':
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   2 +-
 .../pmu-events/arch/x86/tigerlake/cache.json  |  48 ++-
 .../arch/x86/tigerlake/floating-point.json    |   2 +-
 .../arch/x86/tigerlake/frontend.json          |   2 +-
 .../pmu-events/arch/x86/tigerlake/memory.json |   2 +-
 .../pmu-events/arch/x86/tigerlake/other.json  |   1 -
 .../arch/x86/tigerlake/pipeline.json          |   4 +-
 .../arch/x86/tigerlake/tgl-metrics.json       | 378 +++++++++++++++---
 .../arch/x86/tigerlake/uncore-other.json      |  65 +++
 .../arch/x86/tigerlake/virtual-memory.json    |   2 +-
 10 files changed, 439 insertions(+), 67 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index e422ff228aa9..f63e97c57dd8 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -25,10 +25,10 @@ GenuineIntel-6-(37|4C|4D),v14,silvermont,core
 GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
 GenuineIntel-6-55-[01234],v1.28,skylakex,core
 GenuineIntel-6-86,v1.20,snowridgex,core
+GenuineIntel-6-8[CD],v1.07,tigerlake,core
 GenuineIntel-6-2C,v2,westmereep-dp,core
 GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
-GenuineIntel-6-8[CD],v1,tigerlake,core
 AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
 AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
 AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/cache.json b/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
index 0569b2c704ca..5ccf0edc29ac 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
@@ -112,6 +112,17 @@
         "SampleAfterValue": "200003",
         "UMask": "0xe4"
     },
+    {
+        "BriefDescription": "Demand Data Read access L2 cache",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD",
+        "PEBScounters": "0,1,2,3",
+        "PublicDescription": "Counts Demand Data Read requests accessing the L2 cache. These requests may hit or miss L2 cache. True-miss exclude misses that were merged with ongoing L2 misses. An access is counted once.",
+        "SampleAfterValue": "200003",
+        "UMask": "0xe1"
+    },
     {
         "BriefDescription": "RFO requests to L2 cache",
         "CollectPEBSRecord": "2",
@@ -157,16 +168,38 @@
         "UMask": "0xc1"
     },
     {
-        "BriefDescription": "All requests that miss L2 cache",
+        "BriefDescription": "Demand Data Read miss L2 cache",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS",
+        "PEBScounters": "0,1,2,3",
+        "PublicDescription": "Counts demand Data Read requests with true-miss in the L2 cache. True-miss excludes misses that were merged with ongoing L2 misses. An access is counted once.",
+        "SampleAfterValue": "200003",
+        "UMask": "0x21"
+    },
+    {
+        "BriefDescription": "Read requests with true-miss in L2 cache",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "EventCode": "0x24",
         "EventName": "L2_RQSTS.MISS",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Counts all requests that miss L2 cache.",
+        "PublicDescription": "Counts read requests of any type with true-miss in the L2 cache. True-miss excludes L2 misses that were merged with ongoing L2 misses.",
         "SampleAfterValue": "200003",
         "UMask": "0x3f"
     },
+    {
+        "BriefDescription": "All accesses to L2 cache",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3",
+        "EventCode": "0x24",
+        "EventName": "L2_RQSTS.REFERENCES",
+        "PEBScounters": "0,1,2,3",
+        "PublicDescription": "Counts all requests that were hit or true misses in L2 cache. True-miss excludes misses that were merged with ongoing L2 misses.",
+        "SampleAfterValue": "200003",
+        "UMask": "0xff"
+    },
     {
         "BriefDescription": "RFO requests that hit L2 cache",
         "CollectPEBSRecord": "2",
@@ -353,7 +386,7 @@
         "UMask": "0x12"
     },
     {
-        "BriefDescription": "TBD",
+        "BriefDescription": "Snoop hit a modified(HITM) or clean line(HIT_W_FWD) in another on-pkg core which forwarded the data back due to a retired load instruction.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
@@ -361,6 +394,7 @@
         "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD",
         "PEBS": "1",
         "PEBScounters": "0,1,2,3",
+        "PublicDescription": "Counts retired load instructions where a cross-core snoop hit in another cores caches on this socket, the data was forwarded back to the requesting core as the data was modified (SNOOP_HITM) or the L3 did not have the data(SNOOP_HIT_WITH_FWD).",
         "SampleAfterValue": "20011",
         "UMask": "0x4"
     },
@@ -391,7 +425,7 @@
         "UMask": "0x8"
     },
     {
-        "BriefDescription": "TBD",
+        "BriefDescription": "Snoop hit without forwarding in another on-pkg core due to a retired load instruction, data was supplied by the L3.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
@@ -399,6 +433,7 @@
         "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD",
         "PEBS": "1",
         "PEBScounters": "0,1,2,3",
+        "PublicDescription": "Counts retired load instructions in which the L3 supplied the data and a cross-core snoop hit in another cores caches on this socket but that other core did not forward the data back (SNOOP_HIT_NO_FWD).",
         "SampleAfterValue": "20011",
         "UMask": "0x2"
     },
@@ -503,7 +538,6 @@
         "MSRValue": "0x10003C0001",
         "Offcore": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Offcore response can be programmed only with a specific pair of event select and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attributes of the offcore transaction.",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -517,7 +551,6 @@
         "MSRValue": "0x8003C0001",
         "Offcore": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Offcore response can be programmed only with a specific pair of event select and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attributes of the offcore transaction.",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -531,7 +564,6 @@
         "MSRValue": "0x10003C0002",
         "Offcore": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Offcore response can be programmed only with a specific pair of event select and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attributes of the offcore transaction.",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     },
@@ -714,4 +746,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x4"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/floating-point.json b/tools/perf/pmu-events/arch/x86/tigerlake/floating-point.json
index de8eb2b34a3a..978b494c7458 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/floating-point.json
@@ -98,4 +98,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x2"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json b/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json
index 2eaa33cc574e..ccdd8fd99556 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json
@@ -475,4 +475,4 @@
         "SampleAfterValue": "1000003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/memory.json b/tools/perf/pmu-events/arch/x86/tigerlake/memory.json
index 0948de0b160c..6071794cbd32 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/memory.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/memory.json
@@ -292,4 +292,4 @@
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/other.json b/tools/perf/pmu-events/arch/x86/tigerlake/other.json
index 65539490e18f..3ed22dbd0982 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/other.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/other.json
@@ -42,7 +42,6 @@
         "MSRValue": "0x10800",
         "Offcore": "1",
         "PEBScounters": "0,1,2,3",
-        "PublicDescription": "Offcore response can be programmed only with a specific pair of event select and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attributes of the offcore transaction.",
         "SampleAfterValue": "100003",
         "UMask": "0x1"
     }
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
index a8aa1b455c77..1f273144f8e8 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
@@ -432,13 +432,13 @@
         "UMask": "0x40"
     },
     {
-        "BriefDescription": "Cycles where no uops were executed, the Reservation Station was not empty, the Store Buffer was full and there was no outstanding load.",
+        "BriefDescription": "Cycles no uop executed while RS was not empty, the SB was not full and there was no outstanding load.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3,4,5,6,7",
         "EventCode": "0xa6",
         "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS",
         "PEBScounters": "0,1,2,3,4,5,6,7",
-        "PublicDescription": "Counts cycles during which no uops were executed on all ports and Reservation Station (RS) was not empty.",
+        "PublicDescription": "Number of cycles total of 0 uops executed on all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) was not full and there was no outstanding load.",
         "SampleAfterValue": "1000003",
         "UMask": "0x80"
     },
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
index 00a16f1a0f44..03c97bd74ad9 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
@@ -1,20 +1,26 @@
 [
+    {
+        "BriefDescription": "Total pipeline cost of branch related instructions (used for program control-flow including function calls)",
+        "MetricExpr": "100 * (( BR_INST_RETIRED.COND + 3 * BR_INST_RETIRED.NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL) ) / TOPDOWN.SLOTS)",
+        "MetricGroup": "Ret",
+        "MetricName": "Branching_Overhead"
+    },
+    {
+        "BriefDescription": "Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)",
+        "MetricExpr": "100 * (( 5 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE - INT_MISC.UOP_DROPPING ) / TOPDOWN.SLOTS) * ( (ICACHE_64B.IFTAG_STALL / CPU_CLK_UNHALTED.THREAD) + (ICACHE_16B.IFDATA_STALL / CPU_CLK_UNHALTED.THREAD) + (10 * BACLEARS.ANY / CPU_CLK_UNHALTED.THREAD) ) / #(( 5 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE - INT_MISC.UOP_DROPPING ) / TOPDOWN.SLOTS)",
+        "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB",
+        "MetricName": "Big_Code"
+    },
     {
         "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
         "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
-        "MetricGroup": "Summary",
+        "MetricGroup": "Ret;Summary",
         "MetricName": "IPC"
     },
-    {
-        "BriefDescription": "Instruction per taken branch",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
-        "MetricGroup": "Branches;FetchBW;PGO",
-        "MetricName": "IpTB"
-    },
     {
         "BriefDescription": "Cycles Per Instruction (per Logical Processor)",
-        "MetricExpr": "1 / IPC",
-        "MetricGroup": "Pipeline",
+        "MetricExpr": "1 / (INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD)",
+        "MetricGroup": "Pipeline;Mem",
         "MetricName": "CPI"
     },
     {
@@ -24,28 +30,48 @@
         "MetricName": "CLKS"
     },
     {
-        "BriefDescription": "Instructions Per Cycle (per physical core)",
-        "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor ICL onward)",
+        "MetricExpr": "TOPDOWN.SLOTS",
+        "MetricGroup": "TmaL1",
+        "MetricName": "SLOTS"
+    },
+    {
+        "BriefDescription": "Fraction of Physical Core issue-slots utilized by this Logical Processor",
+        "MetricExpr": "TOPDOWN.SLOTS / ( TOPDOWN.SLOTS / 2 ) if #SMT_on else 1",
         "MetricGroup": "SMT;TmaL1",
+        "MetricName": "Slots_Utilization"
+    },
+    {
+        "BriefDescription": "The ratio of Executed- by Issued-Uops",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / UOPS_ISSUED.ANY",
+        "MetricGroup": "Cor;Pipeline",
+        "MetricName": "Execute_per_Issue",
+        "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate of \"execute\" at rename stage."
+    },
+    {
+        "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
+        "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "Ret;SMT;TmaL1",
         "MetricName": "CoreIPC"
     },
     {
         "BriefDescription": "Floating Point Operations Per Cycle",
         "MetricExpr": "( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / CPU_CLK_UNHALTED.DISTRIBUTED",
-        "MetricGroup": "Flops",
+        "MetricGroup": "Ret;Flops",
         "MetricName": "FLOPc"
     },
     {
-        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
-        "MetricExpr": "UOPS_EXECUTED.THREAD / ( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 )",
-        "MetricGroup": "Pipeline;PortsUtil",
-        "MetricName": "ILP"
+        "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)",
+        "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE) ) / ( 2 * CPU_CLK_UNHALTED.DISTRIBUTED )",
+        "MetricGroup": "Cor;Flops;HPC",
+        "MetricName": "FP_Arith_Utilization",
+        "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)."
     },
     {
-        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "BrMispredicts",
-        "MetricName": "IpMispredict"
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
+        "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
+        "MetricName": "ILP"
     },
     {
         "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
@@ -68,99 +94,279 @@
     {
         "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
         "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
-        "MetricGroup": "Branches;InsType",
+        "MetricGroup": "Branches;Fed;InsType",
         "MetricName": "IpBranch"
     },
     {
         "BriefDescription": "Instructions per (near) call (lower number means higher occurrence rate)",
         "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
-        "MetricGroup": "Branches",
+        "MetricGroup": "Branches;Fed;PGO",
         "MetricName": "IpCall"
     },
+    {
+        "BriefDescription": "Instruction per taken branch",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
+        "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO",
+        "MetricName": "IpTB"
+    },
     {
         "BriefDescription": "Branch instructions per taken branch. ",
         "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN",
-        "MetricGroup": "Branches;PGO",
+        "MetricGroup": "Branches;Fed;PGO",
         "MetricName": "BpTkBranch"
     },
     {
         "BriefDescription": "Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)",
         "MetricExpr": "INST_RETIRED.ANY / ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )",
-        "MetricGroup": "Flops;FpArith;InsType",
+        "MetricGroup": "Flops;InsType",
         "MetricName": "IpFLOP"
     },
+    {
+        "BriefDescription": "Instructions per FP Arithmetic instruction (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / ( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE) )",
+        "MetricGroup": "Flops;InsType",
+        "MetricName": "IpArith",
+        "PublicDescription": "Instructions per FP Arithmetic instruction (lower number means higher occurrence rate). May undercount due to FMA double counting. Approximated prior to BDW."
+    },
+    {
+        "BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
+        "MetricGroup": "Flops;FpScalar;InsType",
+        "MetricName": "IpArith_Scalar_SP",
+        "PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
+    },
+    {
+        "BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
+        "MetricGroup": "Flops;FpScalar;InsType",
+        "MetricName": "IpArith_Scalar_DP",
+        "PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
+    },
+    {
+        "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / ( FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE )",
+        "MetricGroup": "Flops;FpVector;InsType",
+        "MetricName": "IpArith_AVX128",
+        "PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
+    },
+    {
+        "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / ( FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )",
+        "MetricGroup": "Flops;FpVector;InsType",
+        "MetricName": "IpArith_AVX256",
+        "PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
+    },
+    {
+        "BriefDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / ( FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )",
+        "MetricGroup": "Flops;FpVector;InsType",
+        "MetricName": "IpArith_AVX512",
+        "PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means higher occurrence rate). May undercount due to FMA double counting."
+    },
+    {
+        "BriefDescription": "Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / cpu@SW_PREFETCH_ACCESS.T0\\,umask\\=0xF@",
+        "MetricGroup": "Prefetches",
+        "MetricName": "IpSWPF"
+    },
     {
         "BriefDescription": "Total number of retired Instructions, Sample with: INST_RETIRED.PREC_DIST",
         "MetricExpr": "INST_RETIRED.ANY",
         "MetricGroup": "Summary;TmaL1",
         "MetricName": "Instructions"
     },
+    {
+        "BriefDescription": "",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
+        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
+        "MetricName": "Execute"
+    },
+    {
+        "BriefDescription": "Average number of Uops issued by front-end when it issued something",
+        "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=1@",
+        "MetricGroup": "Fed;FetchBW",
+        "MetricName": "Fetch_UpC"
+    },
     {
         "BriefDescription": "Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)",
         "MetricExpr": "LSD.UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
-        "MetricGroup": "LSD",
+        "MetricGroup": "Fed;LSD",
         "MetricName": "LSD_Coverage"
     },
     {
         "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
         "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
-        "MetricGroup": "DSB;FetchBW",
+        "MetricGroup": "DSB;Fed;FetchBW",
         "MetricName": "DSB_Coverage"
     },
     {
-        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)",
+        "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.",
+        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu@DSB2MITE_SWITCHES.PENALTY_CYCLES\\,cmask\\=1\\,edge@",
+        "MetricGroup": "DSBmiss",
+        "MetricName": "DSB_Switch_Cost"
+    },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative DSB miss (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS",
+        "MetricGroup": "DSBmiss;Fed",
+        "MetricName": "IpDSB_Miss_Ret"
+    },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;BadSpec;BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
+    {
+        "BriefDescription": "Fraction of branches that are non-taken conditionals",
+        "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;Branches;CodeGen;PGO",
+        "MetricName": "Cond_NT"
+    },
+    {
+        "BriefDescription": "Fraction of branches that are taken conditionals",
+        "MetricExpr": "BR_INST_RETIRED.COND_TAKEN / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;Branches;CodeGen;PGO",
+        "MetricName": "Cond_TK"
+    },
+    {
+        "BriefDescription": "Fraction of branches that are CALL or RET",
+        "MetricExpr": "( BR_INST_RETIRED.NEAR_CALL + BR_INST_RETIRED.NEAR_RETURN ) / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;Branches",
+        "MetricName": "CallRet"
+    },
+    {
+        "BriefDescription": "Fraction of branches that are unconditional (direct or indirect) jumps",
+        "MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL) / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Bad;Branches",
+        "MetricName": "Jump"
+    },
+    {
+        "BriefDescription": "Fraction of branches of other types (not individually covered by other metrics in Info.Branches group)",
+        "MetricExpr": "1 - ( (BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_BRANCHES) + (BR_INST_RETIRED.COND_TAKEN / BR_INST_RETIRED.ALL_BRANCHES) + (( BR_INST_RETIRED.NEAR_CALL + BR_INST_RETIRED.NEAR_RETURN ) / BR_INST_RETIRED.ALL_BRANCHES) + ((BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL) / BR_INST_RETIRED.ALL_BRANCHES) )",
+        "MetricGroup": "Bad;Branches",
+        "MetricName": "Other_Branches"
+    },
+    {
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
-        "MetricGroup": "MemoryBound;MemoryLat",
+        "MetricGroup": "Mem;MemoryBound;MemoryLat",
         "MetricName": "Load_Miss_Real_Latency"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
         "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES",
-        "MetricGroup": "MemoryBound;MemoryBW",
+        "MetricGroup": "Mem;MemoryBound;MemoryBW",
         "MetricName": "MLP"
     },
+    {
+        "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "L1MPKI"
+    },
+    {
+        "BriefDescription": "L1 cache true misses per kilo instruction for all demand loads (including speculative)",
+        "MetricExpr": "1000 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "L1MPKI_Load"
+    },
+    {
+        "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;Backend;CacheMisses",
+        "MetricName": "L2MPKI"
+    },
+    {
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all request types (including speculative)",
+        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses;Offcore",
+        "MetricName": "L2MPKI_All"
+    },
+    {
+        "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instruction for all demand loads  (including speculative)",
+        "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "L2MPKI_Load"
+    },
+    {
+        "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)",
+        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "L2HPKI_All"
+    },
+    {
+        "BriefDescription": "L2 cache hits per kilo instruction for all demand loads  (including speculative)",
+        "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "L2HPKI_Load"
+    },
+    {
+        "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "L3MPKI"
+    },
+    {
+        "BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY",
+        "MetricGroup": "Mem;CacheMisses",
+        "MetricName": "FB_HPKI"
+    },
     {
         "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
         "MetricConstraint": "NO_NMI_WATCHDOG",
-        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING ) / ( 2 * CORE_CLKS )",
-        "MetricGroup": "MemoryTLB",
+        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING ) / ( 2 * CPU_CLK_UNHALTED.DISTRIBUTED )",
+        "MetricGroup": "Mem;MemoryTLB",
         "MetricName": "Page_Walks_Utilization"
     },
     {
-        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
+        "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
         "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
-        "MetricGroup": "MemoryBW",
+        "MetricGroup": "Mem;MemoryBW",
         "MetricName": "L1D_Cache_Fill_BW"
     },
     {
-        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
+        "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
         "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
-        "MetricGroup": "MemoryBW",
+        "MetricGroup": "Mem;MemoryBW",
         "MetricName": "L2_Cache_Fill_BW"
     },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
     {
         "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
         "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
-        "MetricGroup": "MemoryBW;Offcore",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
         "MetricName": "L3_Cache_Access_BW"
     },
     {
-        "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
-        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
-        "MetricGroup": "CacheMisses",
-        "MetricName": "L1MPKI"
+        "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW_1T"
     },
     {
-        "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads",
-        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY",
-        "MetricGroup": "CacheMisses",
-        "MetricName": "L2MPKI"
+        "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW_1T"
     },
     {
-        "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads",
-        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY",
-        "MetricGroup": "CacheMisses",
-        "MetricName": "L3MPKI"
+        "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW_1T"
+    },
+    {
+        "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "(64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time)",
+        "MetricGroup": "Mem;MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW_1T"
     },
     {
         "BriefDescription": "Average CPU Utilization",
@@ -177,8 +383,9 @@
     {
         "BriefDescription": "Giga Floating Point Operations Per Second",
         "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / 1000000000 ) / duration_time",
-        "MetricGroup": "Flops;HPC",
-        "MetricName": "GFLOPs"
+        "MetricGroup": "Cor;Flops;HPC",
+        "MetricName": "GFLOPs",
+        "PublicDescription": "Giga Floating Point Operations Per Second. Aggregate across all supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
     },
     {
         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
@@ -186,9 +393,30 @@
         "MetricGroup": "Power",
         "MetricName": "Turbo_Utilization"
     },
+    {
+        "BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for baseline license level 0",
+        "MetricExpr": "CORE_POWER.LVL0_TURBO_LICENSE / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "Power",
+        "MetricName": "Power_License0_Utilization",
+        "PublicDescription": "Fraction of Core cycles where the core was running with power-delivery for baseline license level 0.  This includes non-AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes."
+    },
+    {
+        "BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for license level 1",
+        "MetricExpr": "CORE_POWER.LVL1_TURBO_LICENSE / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "Power",
+        "MetricName": "Power_License1_Utilization",
+        "PublicDescription": "Fraction of Core cycles where the core was running with power-delivery for license level 1.  This includes high current AVX 256-bit instructions as well as low current AVX 512-bit instructions."
+    },
+    {
+        "BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for license level 2 (introduced in SKX)",
+        "MetricExpr": "CORE_POWER.LVL2_TURBO_LICENSE / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "Power",
+        "MetricName": "Power_License2_Utilization",
+        "PublicDescription": "Fraction of Core cycles where the core was running with power-delivery for license level 2 (introduced in SKX).  This includes high current AVX 512-bit instructions."
+    },
     {
         "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active",
-        "MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED",
+        "MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED if #SMT_on else 0",
         "MetricGroup": "SMT",
         "MetricName": "SMT_2T_Utilization"
     },
@@ -198,6 +426,24 @@
         "MetricGroup": "OS",
         "MetricName": "Kernel_Utilization"
     },
+    {
+        "BriefDescription": "Cycles Per Instruction for the Operating System (OS) Kernel mode",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k",
+        "MetricGroup": "OS",
+        "MetricName": "Kernel_CPI"
+    },
+    {
+        "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
+        "MetricExpr": "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
+        "MetricGroup": "HPC;Mem;MemoryBW;SoC",
+        "MetricName": "DRAM_BW_Use"
+    },
+    {
+        "BriefDescription": "Average number of parallel requests to external memory. Accounts for all requests",
+        "MetricExpr": "UNC_ARB_TRK_OCCUPANCY.ALL / arb@event\\=0x81\\,umask\\=0x1@",
+        "MetricGroup": "Mem;SoC",
+        "MetricName": "MEM_Parallel_Requests"
+    },
     {
         "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
         "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u",
@@ -216,6 +462,18 @@
         "MetricGroup": "Power",
         "MetricName": "C7_Core_Residency"
     },
+    {
+        "BriefDescription": "C2 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C2_Pkg_Residency"
+    },
+    {
+        "BriefDescription": "C3 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C3_Pkg_Residency"
+    },
     {
         "BriefDescription": "C6 residency percent per package",
         "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
@@ -227,5 +485,23 @@
         "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
         "MetricGroup": "Power",
         "MetricName": "C7_Pkg_Residency"
+    },
+    {
+        "BriefDescription": "C8 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c8\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C8_Pkg_Residency"
+    },
+    {
+        "BriefDescription": "C9 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c9\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C9_Pkg_Residency"
+    },
+    {
+        "BriefDescription": "C10 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c10\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C10_Pkg_Residency"
     }
 ]
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json b/tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
new file mode 100644
index 000000000000..734b1845c8e2
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
@@ -0,0 +1,65 @@
+[
+    {
+        "BriefDescription": "Each cycle count number of all outgoing valid entries in ReqTrk. Such entry is defined as valid from it's allocation in ReqTrk till deallocation. Accounts for Coherent and non-coherent traffic.",
+        "CounterType": "PGMABLE",
+        "EventCode": "0x80",
+        "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_ARB_TRK_OCCUPANCY.ALL",
+        "UMask": "0x01",
+        "Unit": "ARB"
+    },
+    {
+        "BriefDescription": "Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.",
+        "Counter": "1",
+        "CounterType": "FREERUN",
+        "EventName": "UNC_MC0_RDCAS_COUNT_FREERUN",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_MC0_RDCAS_COUNT_FREERUN",
+        "Unit": "h_imc"
+    },
+    {
+        "BriefDescription": "Counts every 64B read and write request entering the Memory Controller to DRAM (sum of all channels). Each write request counts as a new request incrementing this counter. However, same cache line write requests (both full and partial) are combined to a single 64 byte data transfer to DRAM.",
+        "CounterType": "FREERUN",
+        "EventName": "UNC_MC0_TOTAL_REQCOUNT_FREERUN",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_MC0_TOTAL_REQCOUNT_FREERUN",
+        "Unit": "h_imc"
+    },
+    {
+        "BriefDescription": "Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.",
+        "Counter": "2",
+        "CounterType": "FREERUN",
+        "EventName": "UNC_MC0_WRCAS_COUNT_FREERUN",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_MC0_WRCAS_COUNT_FREERUN",
+        "Unit": "h_imc"
+    },
+    {
+        "BriefDescription": "Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.",
+        "Counter": "4",
+        "CounterType": "FREERUN",
+        "EventName": "UNC_MC1_RDCAS_COUNT_FREERUN",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_MC1_RDCAS_COUNT_FREERUN",
+        "Unit": "h_imc"
+    },
+    {
+        "BriefDescription": "Counts every 64B read and write request entering the Memory Controller to DRAM (sum of all channels). Each write request counts as a new request incrementing this counter. However, same cache line write requests (both full and partial) are combined to a single 64 byte data transfer to DRAM.",
+        "Counter": "3",
+        "CounterType": "FREERUN",
+        "EventName": "UNC_MC1_TOTAL_REQCOUNT_FREERUN",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_MC1_TOTAL_REQCOUNT_FREERUN",
+        "Unit": "h_imc"
+    },
+    {
+        "BriefDescription": "Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.",
+        "Counter": "5",
+        "CounterType": "FREERUN",
+        "EventName": "UNC_MC1_WRCAS_COUNT_FREERUN",
+        "PerPkg": "1",
+        "PublicDescription": "UNC_MC1_WRCAS_COUNT_FREERUN",
+        "Unit": "h_imc"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/virtual-memory.json b/tools/perf/pmu-events/arch/x86/tigerlake/virtual-memory.json
index 3ebec78969b0..fd364abf8002 100644
--- a/tools/perf/pmu-events/arch/x86/tigerlake/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/tigerlake/virtual-memory.json
@@ -222,4 +222,4 @@
         "SampleAfterValue": "100007",
         "UMask": "0x20"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 29/31] perf vendor events: Update Intel westmereep-dp
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (13 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 28/31] perf vendor events: Update Intel tigerlake Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 30/31] perf vendor events: Update Intel westmereep-sp Ian Rogers
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually
copy the westmereep-dp files into perf and update
mapfile.csv. This change just aligns whitespace.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json         | 2 +-
 .../perf/pmu-events/arch/x86/westmereep-dp/floating-point.json  | 2 +-
 tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json      | 2 +-
 tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json        | 2 +-
 .../perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
index 0f01cf223777..37ed2742fec6 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/cache.json
@@ -2814,4 +2814,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/floating-point.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/floating-point.json
index 39af1329224a..666e466d351c 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/floating-point.json
@@ -226,4 +226,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json
index 8ac5c24888c5..c561ac24d91d 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/frontend.json
@@ -23,4 +23,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json
index 36fbea313c6f..7e529b367c21 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/memory.json
@@ -755,4 +755,4 @@
         "SampleAfterValue": "100000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json b/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
index d63e469a43e1..8099e6700e31 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-dp/virtual-memory.json
@@ -170,4 +170,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 30/31] perf vendor events: Update Intel westmereep-sp
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (14 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 29/31] perf vendor events: Update Intel westmereep-dp Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-22 22:32 ` [PATCH v1 31/31] perf vendor events: Update Intel westmereex Ian Rogers
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually
copy the westmereep-sp files into perf and update
mapfile.csv. This change just aligns whitespace and updates the
version number.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv                      | 2 +-
 .../perf/pmu-events/arch/x86/westmereep-sp/floating-point.json  | 2 +-
 tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json      | 2 +-
 .../perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index f63e97c57dd8..f0155639c22d 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -27,7 +27,7 @@ GenuineIntel-6-55-[01234],v1.28,skylakex,core
 GenuineIntel-6-86,v1.20,snowridgex,core
 GenuineIntel-6-8[CD],v1.07,tigerlake,core
 GenuineIntel-6-2C,v2,westmereep-dp,core
-GenuineIntel-6-25,v2,westmereep-sp,core
+GenuineIntel-6-25,v3,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
 AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
 AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-sp/floating-point.json b/tools/perf/pmu-events/arch/x86/westmereep-sp/floating-point.json
index 39af1329224a..666e466d351c 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/floating-point.json
@@ -226,4 +226,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json b/tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json
index 8ac5c24888c5..c561ac24d91d 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/frontend.json
@@ -23,4 +23,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json b/tools/perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json
index 0252f77a844b..e7affdf7f41b 100644
--- a/tools/perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereep-sp/virtual-memory.json
@@ -146,4 +146,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 31/31] perf vendor events: Update Intel westmereex
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (15 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 30/31] perf vendor events: Update Intel westmereep-sp Ian Rogers
@ 2022-07-22 22:32 ` Ian Rogers
  2022-07-24  5:51 ` [PATCH v1 00/31] Add generated latest Intel events and metrics Sedat Dilek
       [not found] ` <20220722223240.1618013-3-irogers@google.com>
  18 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-22 22:32 UTC (permalink / raw)
  To: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Sedat Dilek
  Cc: Stephane Eranian, Ian Rogers

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually
copy the westmereex files into perf and update mapfile.csv. This
change just aligns whitespace and updates the version number.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/arch/x86/mapfile.csv                    | 2 +-
 tools/perf/pmu-events/arch/x86/westmereex/floating-point.json | 2 +-
 tools/perf/pmu-events/arch/x86/westmereex/frontend.json       | 2 +-
 tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index f0155639c22d..7f2d777fd97f 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -28,7 +28,7 @@ GenuineIntel-6-86,v1.20,snowridgex,core
 GenuineIntel-6-8[CD],v1.07,tigerlake,core
 GenuineIntel-6-2C,v2,westmereep-dp,core
 GenuineIntel-6-25,v3,westmereep-sp,core
-GenuineIntel-6-2F,v2,westmereex,core
+GenuineIntel-6-2F,v3,westmereex,core
 AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
 AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
 AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core
diff --git a/tools/perf/pmu-events/arch/x86/westmereex/floating-point.json b/tools/perf/pmu-events/arch/x86/westmereex/floating-point.json
index 39af1329224a..666e466d351c 100644
--- a/tools/perf/pmu-events/arch/x86/westmereex/floating-point.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/floating-point.json
@@ -226,4 +226,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x8"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereex/frontend.json b/tools/perf/pmu-events/arch/x86/westmereex/frontend.json
index 8ac5c24888c5..c561ac24d91d 100644
--- a/tools/perf/pmu-events/arch/x86/westmereex/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/frontend.json
@@ -23,4 +23,4 @@
         "SampleAfterValue": "2000000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
diff --git a/tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json b/tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json
index 5d1e017d1261..0c3501e6e5a3 100644
--- a/tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json
+++ b/tools/perf/pmu-events/arch/x86/westmereex/virtual-memory.json
@@ -170,4 +170,4 @@
         "SampleAfterValue": "200000",
         "UMask": "0x1"
     }
-]
\ No newline at end of file
+]
-- 
2.37.1.359.gd136c6c3e2-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 00/31] Add generated latest Intel events and metrics
  2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
                   ` (16 preceding siblings ...)
  2022-07-22 22:32 ` [PATCH v1 31/31] perf vendor events: Update Intel westmereex Ian Rogers
@ 2022-07-24  5:51 ` Sedat Dilek
  2022-07-24 19:08   ` Ian Rogers
       [not found] ` <20220722223240.1618013-3-irogers@google.com>
  18 siblings, 1 reply; 25+ messages in thread
From: Sedat Dilek @ 2022-07-24  5:51 UTC (permalink / raw)
  To: Ian Rogers
  Cc: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Stephane Eranian

On Sat, Jul 23, 2022 at 12:32 AM Ian Rogers <irogers@google.com> wrote:
>
> The goal of this patch series is to align the json events for Intel
> platforms with those generated by:
> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> This script takes the latest event json and TMA metrics from:
> https://download.01.org/perfmon/ and adds to these metrics, in
> particular uncore ones, from: https://github.com/intel/perfmon-metrics
> The cpu_operating_frequency metric assumes the presence of the
> system_tsc_freq literal posted/reviewed in:
> https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
>
> Some fixes were needed to the script for generating the json and are
> contained in this pull request:
> https://github.com/intel/event-converter-for-linux-perf/pull/15
>
> The json files were first downloaded before being used to generate the
> perf json files. This fixes non-ascii characters for (R) and (TM) in
> the source json files. This can be reproduced with:
> $ download_and_gen.py --hermetic-download --outdir data
> $ download_and_gen.py --url=file://`pwd`/data/01 --metrics-url=file://`pwd`/data/github
>
> A minor correction is made in the generated json of:
> tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> changing "\\Inbound\\" to just "Inbound" to avoid compilation errors
> caused by \I.
>
> The elkhartlake metrics file is basic and not generated by scripts. It
> is retained here although it causes a difference from the generated
> files.
>
> The mapfile.csv is the third and final difference from the generated
> version due to a bug in 01.org's models for icelake. The existing
> models are preferred and retained.
>
> As well as the #system_tsc_freq being necessary, a test change is
> added here fixing an issue with fake PMU testing exposed in the
> new/updated metrics.
>
> Compared to the previous json, additional changes are the inclusion of
> basic meteorlake events and the renaming of tremontx to
> snowridgex. The new metrics contribute to the size, but a large
> contribution is the inclusion of previously ungenerated and
> experimental uncore events.
>

Hi Ian,

Thanks for the patchset.

I would like to test this.

What is the base for your work?
Mainline Git? perf-tools Git [0]?
Do you have an own Git repository (look like this is [1]) with all the
required/prerequisites and your patchset-v1 for easier fetching?

Regards,
-Sedat-

[0] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/
[1] https://github.com/captain5050

> Ian Rogers (31):
>   perf test: Avoid sysfs state affecting fake events
>   perf vendor events: Update Intel broadwellx
>   perf vendor events: Update Intel broadwell
>   perf vendor events: Update Intel broadwellde
>   perf vendor events: Update Intel alderlake
>   perf vendor events: Update bonnell mapfile.csv
>   perf vendor events: Update Intel cascadelakex
>   perf vendor events: Update Intel elkhartlake
>   perf vendor events: Update goldmont mapfile.csv
>   perf vendor events: Update goldmontplus mapfile.csv
>   perf vendor events: Update Intel haswell
>   perf vendor events: Update Intel haswellx
>   perf vendor events: Update Intel icelake
>   perf vendor events: Update Intel icelakex
>   perf vendor events: Update Intel ivybridge
>   perf vendor events: Update Intel ivytown
>   perf vendor events: Update Intel jaketown
>   perf vendor events: Update Intel knightslanding
>   perf vendor events: Add Intel meteorlake
>   perf vendor events: Update Intel nehalemep
>   perf vendor events: Update Intel nehalemex
>   perf vendor events: Update Intel sandybridge
>   perf vendor events: Update Intel sapphirerapids
>   perf vendor events: Update Intel silvermont
>   perf vendor events: Update Intel skylake
>   perf vendor events: Update Intel skylakex
>   perf vendor events: Update Intel snowridgex
>   perf vendor events: Update Intel tigerlake
>   perf vendor events: Update Intel westmereep-dp
>   perf vendor events: Update Intel westmereep-sp
>   perf vendor events: Update Intel westmereex
>
>  .../arch/x86/alderlake/adl-metrics.json       |     4 +-
>  .../pmu-events/arch/x86/alderlake/cache.json  |   178 +-
>  .../arch/x86/alderlake/floating-point.json    |    19 +-
>  .../arch/x86/alderlake/frontend.json          |    38 +-
>  .../pmu-events/arch/x86/alderlake/memory.json |    40 +-
>  .../pmu-events/arch/x86/alderlake/other.json  |    97 +-
>  .../arch/x86/alderlake/pipeline.json          |   507 +-
>  .../arch/x86/alderlake/uncore-other.json      |     2 +-
>  .../arch/x86/alderlake/virtual-memory.json    |    63 +-
>  .../pmu-events/arch/x86/bonnell/cache.json    |     2 +-
>  .../arch/x86/bonnell/floating-point.json      |     2 +-
>  .../pmu-events/arch/x86/bonnell/frontend.json |     2 +-
>  .../pmu-events/arch/x86/bonnell/memory.json   |     2 +-
>  .../pmu-events/arch/x86/bonnell/other.json    |     2 +-
>  .../pmu-events/arch/x86/bonnell/pipeline.json |     2 +-
>  .../arch/x86/bonnell/virtual-memory.json      |     2 +-
>  .../arch/x86/broadwell/bdw-metrics.json       |   130 +-
>  .../pmu-events/arch/x86/broadwell/cache.json  |     2 +-
>  .../arch/x86/broadwell/floating-point.json    |     2 +-
>  .../arch/x86/broadwell/frontend.json          |     2 +-
>  .../pmu-events/arch/x86/broadwell/memory.json |     2 +-
>  .../pmu-events/arch/x86/broadwell/other.json  |     2 +-
>  .../arch/x86/broadwell/pipeline.json          |     2 +-
>  .../arch/x86/broadwell/uncore-cache.json      |   152 +
>  .../arch/x86/broadwell/uncore-other.json      |    82 +
>  .../pmu-events/arch/x86/broadwell/uncore.json |   278 -
>  .../arch/x86/broadwell/virtual-memory.json    |     2 +-
>  .../arch/x86/broadwellde/bdwde-metrics.json   |   136 +-
>  .../arch/x86/broadwellde/cache.json           |     2 +-
>  .../arch/x86/broadwellde/floating-point.json  |     2 +-
>  .../arch/x86/broadwellde/frontend.json        |     2 +-
>  .../arch/x86/broadwellde/memory.json          |     2 +-
>  .../arch/x86/broadwellde/other.json           |     2 +-
>  .../arch/x86/broadwellde/pipeline.json        |     2 +-
>  .../arch/x86/broadwellde/uncore-cache.json    |  3818 ++-
>  .../arch/x86/broadwellde/uncore-memory.json   |  2867 +-
>  .../arch/x86/broadwellde/uncore-other.json    |  1246 +
>  .../arch/x86/broadwellde/uncore-power.json    |   492 +-
>  .../arch/x86/broadwellde/virtual-memory.json  |     2 +-
>  .../arch/x86/broadwellx/bdx-metrics.json      |   570 +-
>  .../pmu-events/arch/x86/broadwellx/cache.json |    22 +-
>  .../arch/x86/broadwellx/floating-point.json   |     9 +-
>  .../arch/x86/broadwellx/frontend.json         |     2 +-
>  .../arch/x86/broadwellx/memory.json           |    39 +-
>  .../pmu-events/arch/x86/broadwellx/other.json |     2 +-
>  .../arch/x86/broadwellx/pipeline.json         |     4 +-
>  .../arch/x86/broadwellx/uncore-cache.json     |  3788 ++-
>  .../x86/broadwellx/uncore-interconnect.json   |  1438 +-
>  .../arch/x86/broadwellx/uncore-memory.json    |  2849 +-
>  .../arch/x86/broadwellx/uncore-other.json     |  3252 ++
>  .../arch/x86/broadwellx/uncore-power.json     |   437 +-
>  .../arch/x86/broadwellx/virtual-memory.json   |     2 +-
>  .../arch/x86/cascadelakex/cache.json          |     8 +-
>  .../arch/x86/cascadelakex/clx-metrics.json    |   724 +-
>  .../arch/x86/cascadelakex/floating-point.json |     2 +-
>  .../arch/x86/cascadelakex/frontend.json       |     2 +-
>  .../arch/x86/cascadelakex/other.json          |    63 +
>  .../arch/x86/cascadelakex/pipeline.json       |    11 +
>  .../arch/x86/cascadelakex/uncore-memory.json  |     9 +
>  .../arch/x86/cascadelakex/uncore-other.json   |   697 +-
>  .../arch/x86/cascadelakex/virtual-memory.json |     2 +-
>  .../arch/x86/elkhartlake/cache.json           |   956 +-
>  .../arch/x86/elkhartlake/floating-point.json  |    19 +-
>  .../arch/x86/elkhartlake/frontend.json        |    34 +-
>  .../arch/x86/elkhartlake/memory.json          |   388 +-
>  .../arch/x86/elkhartlake/other.json           |   527 +-
>  .../arch/x86/elkhartlake/pipeline.json        |   203 +-
>  .../arch/x86/elkhartlake/virtual-memory.json  |   151 +-
>  .../pmu-events/arch/x86/goldmont/cache.json   |     2 +-
>  .../arch/x86/goldmont/floating-point.json     |     2 +-
>  .../arch/x86/goldmont/frontend.json           |     2 +-
>  .../pmu-events/arch/x86/goldmont/memory.json  |     2 +-
>  .../arch/x86/goldmont/pipeline.json           |     2 +-
>  .../arch/x86/goldmont/virtual-memory.json     |     2 +-
>  .../arch/x86/goldmontplus/cache.json          |     2 +-
>  .../arch/x86/goldmontplus/floating-point.json |     2 +-
>  .../arch/x86/goldmontplus/frontend.json       |     2 +-
>  .../arch/x86/goldmontplus/memory.json         |     2 +-
>  .../arch/x86/goldmontplus/pipeline.json       |     2 +-
>  .../arch/x86/goldmontplus/virtual-memory.json |     2 +-
>  .../pmu-events/arch/x86/haswell/cache.json    |    78 +-
>  .../arch/x86/haswell/floating-point.json      |     2 +-
>  .../pmu-events/arch/x86/haswell/frontend.json |     2 +-
>  .../arch/x86/haswell/hsw-metrics.json         |    85 +-
>  .../pmu-events/arch/x86/haswell/memory.json   |    75 +-
>  .../pmu-events/arch/x86/haswell/other.json    |     2 +-
>  .../pmu-events/arch/x86/haswell/pipeline.json |     9 +-
>  .../arch/x86/haswell/uncore-other.json        |     7 +-
>  .../arch/x86/haswell/virtual-memory.json      |     2 +-
>  .../pmu-events/arch/x86/haswellx/cache.json   |    44 +-
>  .../arch/x86/haswellx/floating-point.json     |     2 +-
>  .../arch/x86/haswellx/frontend.json           |     2 +-
>  .../arch/x86/haswellx/hsx-metrics.json        |    85 +-
>  .../pmu-events/arch/x86/haswellx/memory.json  |    52 +-
>  .../pmu-events/arch/x86/haswellx/other.json   |     2 +-
>  .../arch/x86/haswellx/pipeline.json           |     9 +-
>  .../arch/x86/haswellx/uncore-cache.json       |  3779 ++-
>  .../x86/haswellx/uncore-interconnect.json     |  1430 +-
>  .../arch/x86/haswellx/uncore-memory.json      |  2839 +-
>  .../arch/x86/haswellx/uncore-other.json       |  3170 ++
>  .../arch/x86/haswellx/uncore-power.json       |   477 +-
>  .../arch/x86/haswellx/virtual-memory.json     |     2 +-
>  .../pmu-events/arch/x86/icelake/cache.json    |     8 +-
>  .../arch/x86/icelake/floating-point.json      |     2 +-
>  .../pmu-events/arch/x86/icelake/frontend.json |     2 +-
>  .../arch/x86/icelake/icl-metrics.json         |   126 +-
>  .../arch/x86/icelake/uncore-other.json        |    31 +
>  .../arch/x86/icelake/virtual-memory.json      |     2 +-
>  .../pmu-events/arch/x86/icelakex/cache.json   |    28 +-
>  .../arch/x86/icelakex/floating-point.json     |     2 +-
>  .../arch/x86/icelakex/frontend.json           |     2 +-
>  .../arch/x86/icelakex/icx-metrics.json        |   691 +-
>  .../pmu-events/arch/x86/icelakex/memory.json  |     6 +-
>  .../pmu-events/arch/x86/icelakex/other.json   |    51 +-
>  .../arch/x86/icelakex/pipeline.json           |    12 +
>  .../arch/x86/icelakex/virtual-memory.json     |     2 +-
>  .../pmu-events/arch/x86/ivybridge/cache.json  |     2 +-
>  .../arch/x86/ivybridge/floating-point.json    |     2 +-
>  .../arch/x86/ivybridge/frontend.json          |     2 +-
>  .../arch/x86/ivybridge/ivb-metrics.json       |    94 +-
>  .../pmu-events/arch/x86/ivybridge/memory.json |     2 +-
>  .../pmu-events/arch/x86/ivybridge/other.json  |     2 +-
>  .../arch/x86/ivybridge/pipeline.json          |     4 +-
>  .../arch/x86/ivybridge/uncore-other.json      |     2 +-
>  .../arch/x86/ivybridge/virtual-memory.json    |     2 +-
>  .../pmu-events/arch/x86/ivytown/cache.json    |     2 +-
>  .../arch/x86/ivytown/floating-point.json      |     2 +-
>  .../pmu-events/arch/x86/ivytown/frontend.json |     2 +-
>  .../arch/x86/ivytown/ivt-metrics.json         |    94 +-
>  .../pmu-events/arch/x86/ivytown/memory.json   |     2 +-
>  .../pmu-events/arch/x86/ivytown/other.json    |     2 +-
>  .../arch/x86/ivytown/uncore-cache.json        |  3495 ++-
>  .../arch/x86/ivytown/uncore-interconnect.json |  1750 +-
>  .../arch/x86/ivytown/uncore-memory.json       |  1775 +-
>  .../arch/x86/ivytown/uncore-other.json        |  2411 ++
>  .../arch/x86/ivytown/uncore-power.json        |   696 +-
>  .../arch/x86/ivytown/virtual-memory.json      |     2 +-
>  .../pmu-events/arch/x86/jaketown/cache.json   |     2 +-
>  .../arch/x86/jaketown/floating-point.json     |     2 +-
>  .../arch/x86/jaketown/frontend.json           |     2 +-
>  .../arch/x86/jaketown/jkt-metrics.json        |    11 +-
>  .../pmu-events/arch/x86/jaketown/memory.json  |     2 +-
>  .../pmu-events/arch/x86/jaketown/other.json   |     2 +-
>  .../arch/x86/jaketown/pipeline.json           |    16 +-
>  .../arch/x86/jaketown/uncore-cache.json       |  1960 +-
>  .../x86/jaketown/uncore-interconnect.json     |   824 +-
>  .../arch/x86/jaketown/uncore-memory.json      |   445 +-
>  .../arch/x86/jaketown/uncore-other.json       |  1551 +
>  .../arch/x86/jaketown/uncore-power.json       |   362 +-
>  .../arch/x86/jaketown/virtual-memory.json     |     2 +-
>  .../arch/x86/knightslanding/cache.json        |     2 +-
>  .../x86/knightslanding/floating-point.json    |     2 +-
>  .../arch/x86/knightslanding/frontend.json     |     2 +-
>  .../arch/x86/knightslanding/memory.json       |     2 +-
>  .../arch/x86/knightslanding/pipeline.json     |     2 +-
>  .../x86/knightslanding/uncore-memory.json     |    42 -
>  .../arch/x86/knightslanding/uncore-other.json |  3890 +++
>  .../x86/knightslanding/virtual-memory.json    |     2 +-
>  tools/perf/pmu-events/arch/x86/mapfile.csv    |    74 +-
>  .../pmu-events/arch/x86/meteorlake/cache.json |   262 +
>  .../arch/x86/meteorlake/frontend.json         |    24 +
>  .../arch/x86/meteorlake/memory.json           |   185 +
>  .../pmu-events/arch/x86/meteorlake/other.json |    46 +
>  .../arch/x86/meteorlake/pipeline.json         |   254 +
>  .../arch/x86/meteorlake/virtual-memory.json   |    46 +
>  .../pmu-events/arch/x86/nehalemep/cache.json  |    14 +-
>  .../arch/x86/nehalemep/floating-point.json    |     2 +-
>  .../arch/x86/nehalemep/frontend.json          |     2 +-
>  .../pmu-events/arch/x86/nehalemep/memory.json |     6 +-
>  .../arch/x86/nehalemep/virtual-memory.json    |     2 +-
>  .../pmu-events/arch/x86/nehalemex/cache.json  |  2974 +-
>  .../arch/x86/nehalemex/floating-point.json    |   182 +-
>  .../arch/x86/nehalemex/frontend.json          |    20 +-
>  .../pmu-events/arch/x86/nehalemex/memory.json |   672 +-
>  .../pmu-events/arch/x86/nehalemex/other.json  |   170 +-
>  .../arch/x86/nehalemex/pipeline.json          |   830 +-
>  .../arch/x86/nehalemex/virtual-memory.json    |    92 +-
>  .../arch/x86/sandybridge/cache.json           |     2 +-
>  .../arch/x86/sandybridge/floating-point.json  |     2 +-
>  .../arch/x86/sandybridge/frontend.json        |     4 +-
>  .../arch/x86/sandybridge/memory.json          |     2 +-
>  .../arch/x86/sandybridge/other.json           |     2 +-
>  .../arch/x86/sandybridge/pipeline.json        |    10 +-
>  .../arch/x86/sandybridge/snb-metrics.json     |    11 +-
>  .../arch/x86/sandybridge/uncore-other.json    |     2 +-
>  .../arch/x86/sandybridge/virtual-memory.json  |     2 +-
>  .../arch/x86/sapphirerapids/cache.json        |   135 +-
>  .../x86/sapphirerapids/floating-point.json    |     6 +
>  .../arch/x86/sapphirerapids/frontend.json     |    16 +
>  .../arch/x86/sapphirerapids/memory.json       |    23 +-
>  .../arch/x86/sapphirerapids/other.json        |    68 +-
>  .../arch/x86/sapphirerapids/pipeline.json     |    99 +-
>  .../arch/x86/sapphirerapids/spr-metrics.json  |   566 +-
>  .../arch/x86/sapphirerapids/uncore-other.json |     9 -
>  .../x86/sapphirerapids/virtual-memory.json    |    20 +
>  .../pmu-events/arch/x86/silvermont/cache.json |     2 +-
>  .../arch/x86/silvermont/floating-point.json   |     2 +-
>  .../arch/x86/silvermont/frontend.json         |     2 +-
>  .../arch/x86/silvermont/memory.json           |     2 +-
>  .../pmu-events/arch/x86/silvermont/other.json |     2 +-
>  .../arch/x86/silvermont/pipeline.json         |     2 +-
>  .../arch/x86/silvermont/virtual-memory.json   |     2 +-
>  .../arch/x86/skylake/floating-point.json      |     2 +-
>  .../pmu-events/arch/x86/skylake/frontend.json |     2 +-
>  .../pmu-events/arch/x86/skylake/other.json    |     2 +-
>  .../arch/x86/skylake/skl-metrics.json         |   178 +-
>  .../arch/x86/skylake/uncore-cache.json        |   142 +
>  .../arch/x86/skylake/uncore-other.json        |    79 +
>  .../pmu-events/arch/x86/skylake/uncore.json   |   254 -
>  .../arch/x86/skylake/virtual-memory.json      |     2 +-
>  .../arch/x86/skylakex/floating-point.json     |     2 +-
>  .../arch/x86/skylakex/frontend.json           |     2 +-
>  .../pmu-events/arch/x86/skylakex/other.json   |    66 +-
>  .../arch/x86/skylakex/pipeline.json           |    11 +
>  .../arch/x86/skylakex/skx-metrics.json        |   667 +-
>  .../arch/x86/skylakex/uncore-memory.json      |     9 +
>  .../arch/x86/skylakex/uncore-other.json       |   730 +-
>  .../arch/x86/skylakex/virtual-memory.json     |     2 +-
>  .../x86/{tremontx => snowridgex}/cache.json   |    60 +-
>  .../floating-point.json                       |     9 +-
>  .../{tremontx => snowridgex}/frontend.json    |    20 +-
>  .../x86/{tremontx => snowridgex}/memory.json  |     4 +-
>  .../x86/{tremontx => snowridgex}/other.json   |    18 +-
>  .../{tremontx => snowridgex}/pipeline.json    |    98 +-
>  .../arch/x86/snowridgex/uncore-memory.json    |   619 +
>  .../arch/x86/snowridgex/uncore-other.json     | 25249 ++++++++++++++++
>  .../arch/x86/snowridgex/uncore-power.json     |   235 +
>  .../virtual-memory.json                       |    69 +-
>  .../pmu-events/arch/x86/tigerlake/cache.json  |    48 +-
>  .../arch/x86/tigerlake/floating-point.json    |     2 +-
>  .../arch/x86/tigerlake/frontend.json          |     2 +-
>  .../pmu-events/arch/x86/tigerlake/memory.json |     2 +-
>  .../pmu-events/arch/x86/tigerlake/other.json  |     1 -
>  .../arch/x86/tigerlake/pipeline.json          |     4 +-
>  .../arch/x86/tigerlake/tgl-metrics.json       |   378 +-
>  .../arch/x86/tigerlake/uncore-other.json      |    65 +
>  .../arch/x86/tigerlake/virtual-memory.json    |     2 +-
>  .../arch/x86/tremontx/uncore-memory.json      |   245 -
>  .../arch/x86/tremontx/uncore-other.json       |  2395 --
>  .../arch/x86/tremontx/uncore-power.json       |    11 -
>  .../arch/x86/westmereep-dp/cache.json         |     2 +-
>  .../x86/westmereep-dp/floating-point.json     |     2 +-
>  .../arch/x86/westmereep-dp/frontend.json      |     2 +-
>  .../arch/x86/westmereep-dp/memory.json        |     2 +-
>  .../x86/westmereep-dp/virtual-memory.json     |     2 +-
>  .../x86/westmereep-sp/floating-point.json     |     2 +-
>  .../arch/x86/westmereep-sp/frontend.json      |     2 +-
>  .../x86/westmereep-sp/virtual-memory.json     |     2 +-
>  .../arch/x86/westmereex/floating-point.json   |     2 +-
>  .../arch/x86/westmereex/frontend.json         |     2 +-
>  .../arch/x86/westmereex/virtual-memory.json   |     2 +-
>  tools/perf/tests/pmu-events.c                 |     9 +
>  252 files changed, 89144 insertions(+), 8438 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-other.json
>  delete mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-other.json
>  delete mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
>  delete mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore.json
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/cache.json (95%)
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/floating-point.json (84%)
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/frontend.json (94%)
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/memory.json (99%)
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/other.json (98%)
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/pipeline.json (89%)
>  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
>  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/virtual-memory.json (91%)
>  create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
>  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json
>  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json
>  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json
>
> --
> 2.37.1.359.gd136c6c3e2-goog
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 00/31] Add generated latest Intel events and metrics
  2022-07-24  5:51 ` [PATCH v1 00/31] Add generated latest Intel events and metrics Sedat Dilek
@ 2022-07-24 19:08   ` Ian Rogers
  2022-07-27  6:48     ` Sedat Dilek
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Rogers @ 2022-07-24 19:08 UTC (permalink / raw)
  To: sedat.dilek
  Cc: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Stephane Eranian

On Sat, Jul 23, 2022 at 10:52 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sat, Jul 23, 2022 at 12:32 AM Ian Rogers <irogers@google.com> wrote:
> >
> > The goal of this patch series is to align the json events for Intel
> > platforms with those generated by:
> > https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> > This script takes the latest event json and TMA metrics from:
> > https://download.01.org/perfmon/ and adds to these metrics, in
> > particular uncore ones, from: https://github.com/intel/perfmon-metrics
> > The cpu_operating_frequency metric assumes the presence of the
> > system_tsc_freq literal posted/reviewed in:
> > https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
> >
> > Some fixes were needed to the script for generating the json and are
> > contained in this pull request:
> > https://github.com/intel/event-converter-for-linux-perf/pull/15
> >
> > The json files were first downloaded before being used to generate the
> > perf json files. This fixes non-ascii characters for (R) and (TM) in
> > the source json files. This can be reproduced with:
> > $ download_and_gen.py --hermetic-download --outdir data
> > $ download_and_gen.py --url=file://`pwd`/data/01 --metrics-url=file://`pwd`/data/github
> >
> > A minor correction is made in the generated json of:
> > tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> > changing "\\Inbound\\" to just "Inbound" to avoid compilation errors
> > caused by \I.
> >
> > The elkhartlake metrics file is basic and not generated by scripts. It
> > is retained here although it causes a difference from the generated
> > files.
> >
> > The mapfile.csv is the third and final difference from the generated
> > version due to a bug in 01.org's models for icelake. The existing
> > models are preferred and retained.
> >
> > As well as the #system_tsc_freq being necessary, a test change is
> > added here fixing an issue with fake PMU testing exposed in the
> > new/updated metrics.
> >
> > Compared to the previous json, additional changes are the inclusion of
> > basic meteorlake events and the renaming of tremontx to
> > snowridgex. The new metrics contribute to the size, but a large
> > contribution is the inclusion of previously ungenerated and
> > experimental uncore events.
> >
>
> Hi Ian,
>
> Thanks for the patchset.
>
> I would like to test this.
>
> What is the base for your work?
> Mainline Git? perf-tools Git [0]?
> Do you have an own Git repository (look like this is [1]) with all the
> required/prerequisites and your patchset-v1 for easier fetching?

Hi Sedat,

I have bits of trees all over the place but nowhere I push my kernel
work at the moment. To test the patches try the following:

1) Get a copy of Arnaldo's perf/core branch where active perf tool work happens:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
-b perf/core
$ cd linux

2) Grab the #system_tsc_freq patches using b4:

$ b4 am https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
$ git am ./v4_20220718_irogers_add_arch_tsc_frequency_information.mbx

3) Grab the vendor update patches using b4:

$ b4 am https://lore.kernel.org/lkml/20220722223240.1618013-1-irogers@google.com/
$ git am ./20220722_irogers_add_generated_latest_intel_events_and_metrics.mbx

Not sure why but this fails on a bunch of patches due to conflicts on
mapfile.csv. This doesn't matter too much as long as we get the
mapfile.csv to look like the following:

Family-model,Version,Filename,EventType
GenuineIntel-6-9[7A],v1.13,alderlake,core
GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core
GenuineIntel-6-(3D|47),v26,broadwell,core
GenuineIntel-6-56,v23,broadwellde,core
GenuineIntel-6-4F,v19,broadwellx,core
GenuineIntel-6-55-[56789ABCDEF],v1.16,cascadelakex,core
GenuineIntel-6-96,v1.03,elkhartlake,core
GenuineIntel-6-5[CF],v13,goldmont,core
GenuineIntel-6-7A,v1.01,goldmontplus,core
GenuineIntel-6-(3C|45|46),v31,haswell,core
GenuineIntel-6-3F,v25,haswellx,core
GenuineIntel-6-(7D|7E|A7),v1.14,icelake,core
GenuineIntel-6-6[AC],v1.15,icelakex,core
GenuineIntel-6-3A,v22,ivybridge,core
GenuineIntel-6-3E,v21,ivytown,core
GenuineIntel-6-2D,v21,jaketown,core
GenuineIntel-6-(57|85),v9,knightslanding,core
GenuineIntel-6-AA,v1.00,meteorlake,core
GenuineIntel-6-1[AEF],v3,nehalemep,core
GenuineIntel-6-2E,v3,nehalemex,core
GenuineIntel-6-2A,v17,sandybridge,core
GenuineIntel-6-8F,v1.04,sapphirerapids,core
GenuineIntel-6-(37|4C|4D),v14,silvermont,core
GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
GenuineIntel-6-55-[01234],v1.28,skylakex,core
GenuineIntel-6-86,v1.20,snowridgex,core
GenuineIntel-6-8[CD],v1.07,tigerlake,core
GenuineIntel-6-2C,v2,westmereep-dp,core
GenuineIntel-6-25,v3,westmereep-sp,core
GenuineIntel-6-2F,v3,westmereex,core
AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core

When you see git say something like:
error: patch failed: tools/perf/pmu-events/arch/x86/mapfile.csv:27
error: tools/perf/pmu-events/arch/x86/mapfile.csv: patch does not apply
edit tools/perf/pmu-events/arch/x86/mapfile.csv and add the 1 line
change the patch has, which you can see in the diff with:
$ git am --show-current-patch=diff
then:
$ git add tools/perf/pmu-events/arch/x86/mapfile.csv
$ git am --continue

I found also that the rename of
tools/perf/pmu-events/arch/x86/tremontx to
tools/perf/pmu-events/arch/x86/snowridgex didn't happen (you can mv
the directory manually). Also that meteorlake files didn't get added,
so you can just remove the meteorlake line from mapfile.csv.

4) build and test the perf command

$ mkdir /tmp/perf
$ make -C tools/perf O=/tmp/perf
$ /tmp/perf/perf test

Not sure why b4 isn't behaving well in step 3 but this should give you
something to test with.

Thanks,
Ian



> Regards,
> -Sedat-
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/
> [1] https://github.com/captain5050
>
> > Ian Rogers (31):
> >   perf test: Avoid sysfs state affecting fake events
> >   perf vendor events: Update Intel broadwellx
> >   perf vendor events: Update Intel broadwell
> >   perf vendor events: Update Intel broadwellde
> >   perf vendor events: Update Intel alderlake
> >   perf vendor events: Update bonnell mapfile.csv
> >   perf vendor events: Update Intel cascadelakex
> >   perf vendor events: Update Intel elkhartlake
> >   perf vendor events: Update goldmont mapfile.csv
> >   perf vendor events: Update goldmontplus mapfile.csv
> >   perf vendor events: Update Intel haswell
> >   perf vendor events: Update Intel haswellx
> >   perf vendor events: Update Intel icelake
> >   perf vendor events: Update Intel icelakex
> >   perf vendor events: Update Intel ivybridge
> >   perf vendor events: Update Intel ivytown
> >   perf vendor events: Update Intel jaketown
> >   perf vendor events: Update Intel knightslanding
> >   perf vendor events: Add Intel meteorlake
> >   perf vendor events: Update Intel nehalemep
> >   perf vendor events: Update Intel nehalemex
> >   perf vendor events: Update Intel sandybridge
> >   perf vendor events: Update Intel sapphirerapids
> >   perf vendor events: Update Intel silvermont
> >   perf vendor events: Update Intel skylake
> >   perf vendor events: Update Intel skylakex
> >   perf vendor events: Update Intel snowridgex
> >   perf vendor events: Update Intel tigerlake
> >   perf vendor events: Update Intel westmereep-dp
> >   perf vendor events: Update Intel westmereep-sp
> >   perf vendor events: Update Intel westmereex
> >
> >  .../arch/x86/alderlake/adl-metrics.json       |     4 +-
> >  .../pmu-events/arch/x86/alderlake/cache.json  |   178 +-
> >  .../arch/x86/alderlake/floating-point.json    |    19 +-
> >  .../arch/x86/alderlake/frontend.json          |    38 +-
> >  .../pmu-events/arch/x86/alderlake/memory.json |    40 +-
> >  .../pmu-events/arch/x86/alderlake/other.json  |    97 +-
> >  .../arch/x86/alderlake/pipeline.json          |   507 +-
> >  .../arch/x86/alderlake/uncore-other.json      |     2 +-
> >  .../arch/x86/alderlake/virtual-memory.json    |    63 +-
> >  .../pmu-events/arch/x86/bonnell/cache.json    |     2 +-
> >  .../arch/x86/bonnell/floating-point.json      |     2 +-
> >  .../pmu-events/arch/x86/bonnell/frontend.json |     2 +-
> >  .../pmu-events/arch/x86/bonnell/memory.json   |     2 +-
> >  .../pmu-events/arch/x86/bonnell/other.json    |     2 +-
> >  .../pmu-events/arch/x86/bonnell/pipeline.json |     2 +-
> >  .../arch/x86/bonnell/virtual-memory.json      |     2 +-
> >  .../arch/x86/broadwell/bdw-metrics.json       |   130 +-
> >  .../pmu-events/arch/x86/broadwell/cache.json  |     2 +-
> >  .../arch/x86/broadwell/floating-point.json    |     2 +-
> >  .../arch/x86/broadwell/frontend.json          |     2 +-
> >  .../pmu-events/arch/x86/broadwell/memory.json |     2 +-
> >  .../pmu-events/arch/x86/broadwell/other.json  |     2 +-
> >  .../arch/x86/broadwell/pipeline.json          |     2 +-
> >  .../arch/x86/broadwell/uncore-cache.json      |   152 +
> >  .../arch/x86/broadwell/uncore-other.json      |    82 +
> >  .../pmu-events/arch/x86/broadwell/uncore.json |   278 -
> >  .../arch/x86/broadwell/virtual-memory.json    |     2 +-
> >  .../arch/x86/broadwellde/bdwde-metrics.json   |   136 +-
> >  .../arch/x86/broadwellde/cache.json           |     2 +-
> >  .../arch/x86/broadwellde/floating-point.json  |     2 +-
> >  .../arch/x86/broadwellde/frontend.json        |     2 +-
> >  .../arch/x86/broadwellde/memory.json          |     2 +-
> >  .../arch/x86/broadwellde/other.json           |     2 +-
> >  .../arch/x86/broadwellde/pipeline.json        |     2 +-
> >  .../arch/x86/broadwellde/uncore-cache.json    |  3818 ++-
> >  .../arch/x86/broadwellde/uncore-memory.json   |  2867 +-
> >  .../arch/x86/broadwellde/uncore-other.json    |  1246 +
> >  .../arch/x86/broadwellde/uncore-power.json    |   492 +-
> >  .../arch/x86/broadwellde/virtual-memory.json  |     2 +-
> >  .../arch/x86/broadwellx/bdx-metrics.json      |   570 +-
> >  .../pmu-events/arch/x86/broadwellx/cache.json |    22 +-
> >  .../arch/x86/broadwellx/floating-point.json   |     9 +-
> >  .../arch/x86/broadwellx/frontend.json         |     2 +-
> >  .../arch/x86/broadwellx/memory.json           |    39 +-
> >  .../pmu-events/arch/x86/broadwellx/other.json |     2 +-
> >  .../arch/x86/broadwellx/pipeline.json         |     4 +-
> >  .../arch/x86/broadwellx/uncore-cache.json     |  3788 ++-
> >  .../x86/broadwellx/uncore-interconnect.json   |  1438 +-
> >  .../arch/x86/broadwellx/uncore-memory.json    |  2849 +-
> >  .../arch/x86/broadwellx/uncore-other.json     |  3252 ++
> >  .../arch/x86/broadwellx/uncore-power.json     |   437 +-
> >  .../arch/x86/broadwellx/virtual-memory.json   |     2 +-
> >  .../arch/x86/cascadelakex/cache.json          |     8 +-
> >  .../arch/x86/cascadelakex/clx-metrics.json    |   724 +-
> >  .../arch/x86/cascadelakex/floating-point.json |     2 +-
> >  .../arch/x86/cascadelakex/frontend.json       |     2 +-
> >  .../arch/x86/cascadelakex/other.json          |    63 +
> >  .../arch/x86/cascadelakex/pipeline.json       |    11 +
> >  .../arch/x86/cascadelakex/uncore-memory.json  |     9 +
> >  .../arch/x86/cascadelakex/uncore-other.json   |   697 +-
> >  .../arch/x86/cascadelakex/virtual-memory.json |     2 +-
> >  .../arch/x86/elkhartlake/cache.json           |   956 +-
> >  .../arch/x86/elkhartlake/floating-point.json  |    19 +-
> >  .../arch/x86/elkhartlake/frontend.json        |    34 +-
> >  .../arch/x86/elkhartlake/memory.json          |   388 +-
> >  .../arch/x86/elkhartlake/other.json           |   527 +-
> >  .../arch/x86/elkhartlake/pipeline.json        |   203 +-
> >  .../arch/x86/elkhartlake/virtual-memory.json  |   151 +-
> >  .../pmu-events/arch/x86/goldmont/cache.json   |     2 +-
> >  .../arch/x86/goldmont/floating-point.json     |     2 +-
> >  .../arch/x86/goldmont/frontend.json           |     2 +-
> >  .../pmu-events/arch/x86/goldmont/memory.json  |     2 +-
> >  .../arch/x86/goldmont/pipeline.json           |     2 +-
> >  .../arch/x86/goldmont/virtual-memory.json     |     2 +-
> >  .../arch/x86/goldmontplus/cache.json          |     2 +-
> >  .../arch/x86/goldmontplus/floating-point.json |     2 +-
> >  .../arch/x86/goldmontplus/frontend.json       |     2 +-
> >  .../arch/x86/goldmontplus/memory.json         |     2 +-
> >  .../arch/x86/goldmontplus/pipeline.json       |     2 +-
> >  .../arch/x86/goldmontplus/virtual-memory.json |     2 +-
> >  .../pmu-events/arch/x86/haswell/cache.json    |    78 +-
> >  .../arch/x86/haswell/floating-point.json      |     2 +-
> >  .../pmu-events/arch/x86/haswell/frontend.json |     2 +-
> >  .../arch/x86/haswell/hsw-metrics.json         |    85 +-
> >  .../pmu-events/arch/x86/haswell/memory.json   |    75 +-
> >  .../pmu-events/arch/x86/haswell/other.json    |     2 +-
> >  .../pmu-events/arch/x86/haswell/pipeline.json |     9 +-
> >  .../arch/x86/haswell/uncore-other.json        |     7 +-
> >  .../arch/x86/haswell/virtual-memory.json      |     2 +-
> >  .../pmu-events/arch/x86/haswellx/cache.json   |    44 +-
> >  .../arch/x86/haswellx/floating-point.json     |     2 +-
> >  .../arch/x86/haswellx/frontend.json           |     2 +-
> >  .../arch/x86/haswellx/hsx-metrics.json        |    85 +-
> >  .../pmu-events/arch/x86/haswellx/memory.json  |    52 +-
> >  .../pmu-events/arch/x86/haswellx/other.json   |     2 +-
> >  .../arch/x86/haswellx/pipeline.json           |     9 +-
> >  .../arch/x86/haswellx/uncore-cache.json       |  3779 ++-
> >  .../x86/haswellx/uncore-interconnect.json     |  1430 +-
> >  .../arch/x86/haswellx/uncore-memory.json      |  2839 +-
> >  .../arch/x86/haswellx/uncore-other.json       |  3170 ++
> >  .../arch/x86/haswellx/uncore-power.json       |   477 +-
> >  .../arch/x86/haswellx/virtual-memory.json     |     2 +-
> >  .../pmu-events/arch/x86/icelake/cache.json    |     8 +-
> >  .../arch/x86/icelake/floating-point.json      |     2 +-
> >  .../pmu-events/arch/x86/icelake/frontend.json |     2 +-
> >  .../arch/x86/icelake/icl-metrics.json         |   126 +-
> >  .../arch/x86/icelake/uncore-other.json        |    31 +
> >  .../arch/x86/icelake/virtual-memory.json      |     2 +-
> >  .../pmu-events/arch/x86/icelakex/cache.json   |    28 +-
> >  .../arch/x86/icelakex/floating-point.json     |     2 +-
> >  .../arch/x86/icelakex/frontend.json           |     2 +-
> >  .../arch/x86/icelakex/icx-metrics.json        |   691 +-
> >  .../pmu-events/arch/x86/icelakex/memory.json  |     6 +-
> >  .../pmu-events/arch/x86/icelakex/other.json   |    51 +-
> >  .../arch/x86/icelakex/pipeline.json           |    12 +
> >  .../arch/x86/icelakex/virtual-memory.json     |     2 +-
> >  .../pmu-events/arch/x86/ivybridge/cache.json  |     2 +-
> >  .../arch/x86/ivybridge/floating-point.json    |     2 +-
> >  .../arch/x86/ivybridge/frontend.json          |     2 +-
> >  .../arch/x86/ivybridge/ivb-metrics.json       |    94 +-
> >  .../pmu-events/arch/x86/ivybridge/memory.json |     2 +-
> >  .../pmu-events/arch/x86/ivybridge/other.json  |     2 +-
> >  .../arch/x86/ivybridge/pipeline.json          |     4 +-
> >  .../arch/x86/ivybridge/uncore-other.json      |     2 +-
> >  .../arch/x86/ivybridge/virtual-memory.json    |     2 +-
> >  .../pmu-events/arch/x86/ivytown/cache.json    |     2 +-
> >  .../arch/x86/ivytown/floating-point.json      |     2 +-
> >  .../pmu-events/arch/x86/ivytown/frontend.json |     2 +-
> >  .../arch/x86/ivytown/ivt-metrics.json         |    94 +-
> >  .../pmu-events/arch/x86/ivytown/memory.json   |     2 +-
> >  .../pmu-events/arch/x86/ivytown/other.json    |     2 +-
> >  .../arch/x86/ivytown/uncore-cache.json        |  3495 ++-
> >  .../arch/x86/ivytown/uncore-interconnect.json |  1750 +-
> >  .../arch/x86/ivytown/uncore-memory.json       |  1775 +-
> >  .../arch/x86/ivytown/uncore-other.json        |  2411 ++
> >  .../arch/x86/ivytown/uncore-power.json        |   696 +-
> >  .../arch/x86/ivytown/virtual-memory.json      |     2 +-
> >  .../pmu-events/arch/x86/jaketown/cache.json   |     2 +-
> >  .../arch/x86/jaketown/floating-point.json     |     2 +-
> >  .../arch/x86/jaketown/frontend.json           |     2 +-
> >  .../arch/x86/jaketown/jkt-metrics.json        |    11 +-
> >  .../pmu-events/arch/x86/jaketown/memory.json  |     2 +-
> >  .../pmu-events/arch/x86/jaketown/other.json   |     2 +-
> >  .../arch/x86/jaketown/pipeline.json           |    16 +-
> >  .../arch/x86/jaketown/uncore-cache.json       |  1960 +-
> >  .../x86/jaketown/uncore-interconnect.json     |   824 +-
> >  .../arch/x86/jaketown/uncore-memory.json      |   445 +-
> >  .../arch/x86/jaketown/uncore-other.json       |  1551 +
> >  .../arch/x86/jaketown/uncore-power.json       |   362 +-
> >  .../arch/x86/jaketown/virtual-memory.json     |     2 +-
> >  .../arch/x86/knightslanding/cache.json        |     2 +-
> >  .../x86/knightslanding/floating-point.json    |     2 +-
> >  .../arch/x86/knightslanding/frontend.json     |     2 +-
> >  .../arch/x86/knightslanding/memory.json       |     2 +-
> >  .../arch/x86/knightslanding/pipeline.json     |     2 +-
> >  .../x86/knightslanding/uncore-memory.json     |    42 -
> >  .../arch/x86/knightslanding/uncore-other.json |  3890 +++
> >  .../x86/knightslanding/virtual-memory.json    |     2 +-
> >  tools/perf/pmu-events/arch/x86/mapfile.csv    |    74 +-
> >  .../pmu-events/arch/x86/meteorlake/cache.json |   262 +
> >  .../arch/x86/meteorlake/frontend.json         |    24 +
> >  .../arch/x86/meteorlake/memory.json           |   185 +
> >  .../pmu-events/arch/x86/meteorlake/other.json |    46 +
> >  .../arch/x86/meteorlake/pipeline.json         |   254 +
> >  .../arch/x86/meteorlake/virtual-memory.json   |    46 +
> >  .../pmu-events/arch/x86/nehalemep/cache.json  |    14 +-
> >  .../arch/x86/nehalemep/floating-point.json    |     2 +-
> >  .../arch/x86/nehalemep/frontend.json          |     2 +-
> >  .../pmu-events/arch/x86/nehalemep/memory.json |     6 +-
> >  .../arch/x86/nehalemep/virtual-memory.json    |     2 +-
> >  .../pmu-events/arch/x86/nehalemex/cache.json  |  2974 +-
> >  .../arch/x86/nehalemex/floating-point.json    |   182 +-
> >  .../arch/x86/nehalemex/frontend.json          |    20 +-
> >  .../pmu-events/arch/x86/nehalemex/memory.json |   672 +-
> >  .../pmu-events/arch/x86/nehalemex/other.json  |   170 +-
> >  .../arch/x86/nehalemex/pipeline.json          |   830 +-
> >  .../arch/x86/nehalemex/virtual-memory.json    |    92 +-
> >  .../arch/x86/sandybridge/cache.json           |     2 +-
> >  .../arch/x86/sandybridge/floating-point.json  |     2 +-
> >  .../arch/x86/sandybridge/frontend.json        |     4 +-
> >  .../arch/x86/sandybridge/memory.json          |     2 +-
> >  .../arch/x86/sandybridge/other.json           |     2 +-
> >  .../arch/x86/sandybridge/pipeline.json        |    10 +-
> >  .../arch/x86/sandybridge/snb-metrics.json     |    11 +-
> >  .../arch/x86/sandybridge/uncore-other.json    |     2 +-
> >  .../arch/x86/sandybridge/virtual-memory.json  |     2 +-
> >  .../arch/x86/sapphirerapids/cache.json        |   135 +-
> >  .../x86/sapphirerapids/floating-point.json    |     6 +
> >  .../arch/x86/sapphirerapids/frontend.json     |    16 +
> >  .../arch/x86/sapphirerapids/memory.json       |    23 +-
> >  .../arch/x86/sapphirerapids/other.json        |    68 +-
> >  .../arch/x86/sapphirerapids/pipeline.json     |    99 +-
> >  .../arch/x86/sapphirerapids/spr-metrics.json  |   566 +-
> >  .../arch/x86/sapphirerapids/uncore-other.json |     9 -
> >  .../x86/sapphirerapids/virtual-memory.json    |    20 +
> >  .../pmu-events/arch/x86/silvermont/cache.json |     2 +-
> >  .../arch/x86/silvermont/floating-point.json   |     2 +-
> >  .../arch/x86/silvermont/frontend.json         |     2 +-
> >  .../arch/x86/silvermont/memory.json           |     2 +-
> >  .../pmu-events/arch/x86/silvermont/other.json |     2 +-
> >  .../arch/x86/silvermont/pipeline.json         |     2 +-
> >  .../arch/x86/silvermont/virtual-memory.json   |     2 +-
> >  .../arch/x86/skylake/floating-point.json      |     2 +-
> >  .../pmu-events/arch/x86/skylake/frontend.json |     2 +-
> >  .../pmu-events/arch/x86/skylake/other.json    |     2 +-
> >  .../arch/x86/skylake/skl-metrics.json         |   178 +-
> >  .../arch/x86/skylake/uncore-cache.json        |   142 +
> >  .../arch/x86/skylake/uncore-other.json        |    79 +
> >  .../pmu-events/arch/x86/skylake/uncore.json   |   254 -
> >  .../arch/x86/skylake/virtual-memory.json      |     2 +-
> >  .../arch/x86/skylakex/floating-point.json     |     2 +-
> >  .../arch/x86/skylakex/frontend.json           |     2 +-
> >  .../pmu-events/arch/x86/skylakex/other.json   |    66 +-
> >  .../arch/x86/skylakex/pipeline.json           |    11 +
> >  .../arch/x86/skylakex/skx-metrics.json        |   667 +-
> >  .../arch/x86/skylakex/uncore-memory.json      |     9 +
> >  .../arch/x86/skylakex/uncore-other.json       |   730 +-
> >  .../arch/x86/skylakex/virtual-memory.json     |     2 +-
> >  .../x86/{tremontx => snowridgex}/cache.json   |    60 +-
> >  .../floating-point.json                       |     9 +-
> >  .../{tremontx => snowridgex}/frontend.json    |    20 +-
> >  .../x86/{tremontx => snowridgex}/memory.json  |     4 +-
> >  .../x86/{tremontx => snowridgex}/other.json   |    18 +-
> >  .../{tremontx => snowridgex}/pipeline.json    |    98 +-
> >  .../arch/x86/snowridgex/uncore-memory.json    |   619 +
> >  .../arch/x86/snowridgex/uncore-other.json     | 25249 ++++++++++++++++
> >  .../arch/x86/snowridgex/uncore-power.json     |   235 +
> >  .../virtual-memory.json                       |    69 +-
> >  .../pmu-events/arch/x86/tigerlake/cache.json  |    48 +-
> >  .../arch/x86/tigerlake/floating-point.json    |     2 +-
> >  .../arch/x86/tigerlake/frontend.json          |     2 +-
> >  .../pmu-events/arch/x86/tigerlake/memory.json |     2 +-
> >  .../pmu-events/arch/x86/tigerlake/other.json  |     1 -
> >  .../arch/x86/tigerlake/pipeline.json          |     4 +-
> >  .../arch/x86/tigerlake/tgl-metrics.json       |   378 +-
> >  .../arch/x86/tigerlake/uncore-other.json      |    65 +
> >  .../arch/x86/tigerlake/virtual-memory.json    |     2 +-
> >  .../arch/x86/tremontx/uncore-memory.json      |   245 -
> >  .../arch/x86/tremontx/uncore-other.json       |  2395 --
> >  .../arch/x86/tremontx/uncore-power.json       |    11 -
> >  .../arch/x86/westmereep-dp/cache.json         |     2 +-
> >  .../x86/westmereep-dp/floating-point.json     |     2 +-
> >  .../arch/x86/westmereep-dp/frontend.json      |     2 +-
> >  .../arch/x86/westmereep-dp/memory.json        |     2 +-
> >  .../x86/westmereep-dp/virtual-memory.json     |     2 +-
> >  .../x86/westmereep-sp/floating-point.json     |     2 +-
> >  .../arch/x86/westmereep-sp/frontend.json      |     2 +-
> >  .../x86/westmereep-sp/virtual-memory.json     |     2 +-
> >  .../arch/x86/westmereex/floating-point.json   |     2 +-
> >  .../arch/x86/westmereex/frontend.json         |     2 +-
> >  .../arch/x86/westmereex/virtual-memory.json   |     2 +-
> >  tools/perf/tests/pmu-events.c                 |     9 +
> >  252 files changed, 89144 insertions(+), 8438 deletions(-)
> >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-cache.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-other.json
> >  delete mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-other.json
> >  delete mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/cache.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
> >  delete mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore.json
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/cache.json (95%)
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/floating-point.json (84%)
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/frontend.json (94%)
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/memory.json (99%)
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/other.json (98%)
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/pipeline.json (89%)
> >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-other.json
> >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/virtual-memory.json (91%)
> >  create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
> >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json
> >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json
> >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json
> >
> > --
> > 2.37.1.359.gd136c6c3e2-goog
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 02/31] perf vendor events: Update Intel broadwellx
       [not found]     ` <CAP-5=fV65fiadnaAmebYS1CjxwuFy4oKxV88v6oHdVPCc=n+Ow@mail.gmail.com>
@ 2022-07-26  1:25       ` Xing Zhengjun
  2022-07-26  4:49         ` Ian Rogers
  0 siblings, 1 reply; 25+ messages in thread
From: Xing Zhengjun @ 2022-07-26  1:25 UTC (permalink / raw)
  To: Ian Rogers, Arnaldo Carvalho de Melo
  Cc: Taylor, Perry, Biggers, Caleb, Bopardikar, Kshipra, Kan Liang,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, LKML, linux-perf-users,
	Sedat Dilek, Stephane Eranian


HI Arnaldo,

On 7/25/2022 9:06 PM, Ian Rogers wrote:
> 
> 
> On Sun, Jul 24, 2022, 6:34 PM Xing Zhengjun 
> <zhengjun.xing@linux.intel.com <mailto:zhengjun.xing@linux.intel.com>> 
> wrote:
> 
>     Hi Ian,
> 
>     On 7/23/2022 6:32 AM, Ian Rogers wrote:
>      > Use script at:
>      >
>     https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
>     <https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py>
>      >
> 
>     It is better to add the event JSON file version and TMA version for the
>     future track. For example, the event list is based on broadwellx JSON
>     file v19, the metrics are based on TMA4.4-full.
> 
> 
> Thanks Xing,
> 
> I'll add that in v2. I'd skipped it this time as I'd been adding it to 
> the mapfile. I'll also break apart the tremontx to snowridgex change to 
> match yours. I also will rebase to see if that'll fix the git am issue. 
> Apologies in advance to everyone's inboxes :-)
> 
Hi Arnaldo,

Except for Snowridgex, I also post SPR/ADL/HSX/BDX event list last 
month, Can these be merged in or do I need to update them?  Thanks.

https://lore.kernel.org/all/20220609094222.2030167-1-zhengjun.xing@linux.intel.com/
https://lore.kernel.org/all/20220609094222.2030167-2-zhengjun.xing@linux.intel.com/

https://lore.kernel.org/all/20220607092749.1976878-1-zhengjun.xing@linux.intel.com/
https://lore.kernel.org/all/20220607092749.1976878-2-zhengjun.xing@linux.intel.com/
https://lore.kernel.org/all/20220614145019.2177071-1-zhengjun.xing@linux.intel.com/
https://lore.kernel.org/all/20220614145019.2177071-2-zhengjun.xing@linux.intel.com/

> Thanks,
> Ian
> 
> 
>      > to download and generate the latest events and metrics. Manually copy
>      > the broadwellx files into perf and update mapfile.csv.
>      >
>      > Tested with 'perf test':
>      >   10: PMU events                                                 
>          :
>      >   10.1: PMU event table sanity                                   
>          : Ok
>      >   10.2: PMU event map aliases                                   
>           : Ok
>      >   10.3: Parsing of PMU event table metrics                       
>          : Ok
>      >   10.4: Parsing of PMU event table metrics with fake PMUs       
>           : Ok
>      >   90: perf all metricgroups test                                 
>          : Ok
>      >   91: perf all metrics test                                     
>           : Skip
>      >   93: perf all PMU test                                         
>           : Ok
>      >
>      > Signed-off-by: Ian Rogers <irogers@google.com
>     <mailto:irogers@google.com>>
>      > ---
>      >   .../arch/x86/broadwellx/bdx-metrics.json      |  570 ++-
>      >   .../pmu-events/arch/x86/broadwellx/cache.json |   22 +-
>      >   .../arch/x86/broadwellx/floating-point.json   |    9 +-
>      >   .../arch/x86/broadwellx/frontend.json         |    2 +-
>      >   .../arch/x86/broadwellx/memory.json           |   39 +-
>      >   .../pmu-events/arch/x86/broadwellx/other.json |    2 +-
>      >   .../arch/x86/broadwellx/pipeline.json         |    4 +-
>      >   .../arch/x86/broadwellx/uncore-cache.json     | 3788
>     ++++++++++++++++-
>      >   .../x86/broadwellx/uncore-interconnect.json   | 1438 ++++++-
>      >   .../arch/x86/broadwellx/uncore-memory.json    | 2849 ++++++++++++-
>      >   .../arch/x86/broadwellx/uncore-other.json     | 3252 ++++++++++++++
>      >   .../arch/x86/broadwellx/uncore-power.json     |  437 +-
>      >   .../arch/x86/broadwellx/virtual-memory.json   |    2 +-
>      >   tools/perf/pmu-events/arch/x86/mapfile.csv    |    2 +-
>      >   14 files changed, 12103 insertions(+), 313 deletions(-)
>      >   create mode 100644
>     tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
>      >
>      > diff --git
>     a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>     b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>      > index b055947c0afe..720ee7c9332d 100644
>      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>      > @@ -74,12 +74,6 @@
>      >           "MetricGroup": "Branches;Fed;FetchBW",
>      >           "MetricName": "UpTB"
>      >       },
>      > -    {
>      > -        "BriefDescription": "Cycles Per Instruction (per Logical
>     Processor)",
>      > -        "MetricExpr": "1 / (INST_RETIRED.ANY /
>     CPU_CLK_UNHALTED.THREAD)",
>      > -        "MetricGroup": "Pipeline;Mem",
>      > -        "MetricName": "CPI"
>      > -    },
>      >       {
>      >           "BriefDescription": "Per-Logical Processor actual
>     clocks when the Logical Processor is active.",
>      >           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>      > @@ -130,43 +124,25 @@
>      >           "MetricName": "FLOPc_SMT"
>      >       },
>      >       {
>      > -        "BriefDescription": "Actual per-core usage of the
>     Floating Point execution units (regardless of the vector width)",
>      > +        "BriefDescription": "Actual per-core usage of the
>     Floating Point non-X87 execution units (regardless of precision or
>     vector-width)",
>      >           "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) +
>     (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
>     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 *
>     CPU_CLK_UNHALTED.THREAD )",
>      >           "MetricGroup": "Cor;Flops;HPC",
>      >           "MetricName": "FP_Arith_Utilization",
>      > -        "PublicDescription": "Actual per-core usage of the
>     Floating Point execution units (regardless of the vector width).
>     Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
>      > +        "PublicDescription": "Actual per-core usage of the
>     Floating Point non-X87 execution units (regardless of precision or
>     vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply
>     Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar
>     or 128/256-bit vectors - less common)."
>      >       },
>      >       {
>      > -        "BriefDescription": "Actual per-core usage of the
>     Floating Point execution units (regardless of the vector width). SMT
>     version; use when SMT is enabled and measuring per logical CPU.",
>      > +        "BriefDescription": "Actual per-core usage of the
>     Floating Point non-X87 execution units (regardless of precision or
>     vector-width). SMT version; use when SMT is enabled and measuring
>     per logical CPU.",
>      >           "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) +
>     (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
>     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 * ( (
>     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) )",
>      >           "MetricGroup": "Cor;Flops;HPC_SMT",
>      >           "MetricName": "FP_Arith_Utilization_SMT",
>      > -        "PublicDescription": "Actual per-core usage of the
>     Floating Point execution units (regardless of the vector width).
>     Values > 1 are possible due to Fused-Multiply Add (FMA) counting.
>     SMT version; use when SMT is enabled and measuring per logical CPU."
>      > +        "PublicDescription": "Actual per-core usage of the
>     Floating Point non-X87 execution units (regardless of precision or
>     vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply
>     Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar
>     or 128/256-bit vectors - less common). SMT version; use when SMT is
>     enabled and measuring per logical CPU."
>      >       },
>      >       {
>      > -        "BriefDescription": "Instruction-Level-Parallelism
>     (average number of uops executed when there is at least 1 uop
>     executed)",
>      > +        "BriefDescription": "Instruction-Level-Parallelism
>     (average number of uops executed when there is execution) per-core",
>      >           "MetricExpr": "UOPS_EXECUTED.THREAD / ((
>     cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else
>     UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>      >           "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
>      >           "MetricName": "ILP"
>      >       },
>      > -    {
>      > -        "BriefDescription": "Branch Misprediction Cost: Fraction
>     of TMA slots wasted per non-speculative branch misprediction
>     (retired JEClear)",
>      > -        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *
>     INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>     CPU_CLK_UNHALTED.THREAD)) * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>     CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) /
>     BR_MISP_RETIRED.ALL_BRANCHES",
>      > -        "MetricGroup": "Bad;BrMispredicts",
>      > -        "MetricName": "Branch_Misprediction_Cost"
>      > -    },
>      > -    {
>      > -        "BriefDescription": "Branch Misprediction Cost: Fraction
>     of TMA slots wasted per non-speculative branch misprediction
>     (retired JEClear)",
>      > -        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( (
>     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )
>     )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (
>     ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>     * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( (
>     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>     ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))
>     / BR_MISP_RETIRED.ALL_BRANCHES",
>      > -        "MetricGroup": "Bad;BrMispredicts_SMT",
>      > -        "MetricName": "Branch_Misprediction_Cost_SMT"
>      > -    },
>      > -    {
>      > -        "BriefDescription": "Number of Instructions per
>     non-speculative Branch Misprediction (JEClear)",
>      > -        "MetricExpr": "INST_RETIRED.ANY /
>     BR_MISP_RETIRED.ALL_BRANCHES",
>      > -        "MetricGroup": "Bad;BadSpec;BrMispredicts",
>      > -        "MetricName": "IpMispredict"
>      > -    },
>      >       {
>      >           "BriefDescription": "Core actual clocks when any
>     Logical Processor is active on the Physical Core",
>      >           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1
>     + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>      > @@ -256,6 +232,18 @@
>      >           "MetricGroup": "Summary;TmaL1",
>      >           "MetricName": "Instructions"
>      >       },
>      > +    {
>      > +        "BriefDescription": "Average number of Uops retired in
>     cycles where at least one uop has retired.",
>      > +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS /
>     cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
>      > +        "MetricGroup": "Pipeline;Ret",
>      > +        "MetricName": "Retire"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "",
>      > +        "MetricExpr": "UOPS_EXECUTED.THREAD /
>     cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
>      > +        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
>      > +        "MetricName": "Execute"
>      > +    },
>      >       {
>      >           "BriefDescription": "Fraction of Uops delivered by the
>     DSB (aka Decoded ICache; or Uop Cache)",
>      >           "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS +
>     LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>      > @@ -263,35 +251,34 @@
>      >           "MetricName": "DSB_Coverage"
>      >       },
>      >       {
>      > -        "BriefDescription": "Actual Average Latency for L1
>     data-cache miss demand load instructions (in core cycles)",
>      > -        "MetricExpr": "L1D_PEND_MISS.PENDING / (
>     MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>      > -        "MetricGroup": "Mem;MemoryBound;MemoryLat",
>      > -        "MetricName": "Load_Miss_Real_Latency",
>      > -        "PublicDescription": "Actual Average Latency for L1
>     data-cache miss demand load instructions (in core cycles). Latency
>     may be overestimated for multi-load instructions - e.g. repeat strings."
>      > +        "BriefDescription": "Number of Instructions per
>     non-speculative Branch Misprediction (JEClear) (lower number means
>     higher occurrence rate)",
>      > +        "MetricExpr": "INST_RETIRED.ANY /
>     BR_MISP_RETIRED.ALL_BRANCHES",
>      > +        "MetricGroup": "Bad;BadSpec;BrMispredicts",
>      > +        "MetricName": "IpMispredict"
>      >       },
>      >       {
>      > -        "BriefDescription": "Memory-Level-Parallelism (average
>     number of L1 miss demand load when there is at least one such miss.
>     Per-Logical Processor)",
>      > -        "MetricExpr": "L1D_PEND_MISS.PENDING /
>     L1D_PEND_MISS.PENDING_CYCLES",
>      > -        "MetricGroup": "Mem;MemoryBound;MemoryBW",
>      > -        "MetricName": "MLP"
>      > +        "BriefDescription": "Branch Misprediction Cost: Fraction
>     of TMA slots wasted per non-speculative branch misprediction
>     (retired JEClear)",
>      > +        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *
>     INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>     CPU_CLK_UNHALTED.THREAD)) * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>     CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) /
>     BR_MISP_RETIRED.ALL_BRANCHES",
>      > +        "MetricGroup": "Bad;BrMispredicts",
>      > +        "MetricName": "Branch_Misprediction_Cost"
>      >       },
>      >       {
>      > -        "BriefDescription": "Average data fill bandwidth to the
>     L1 data cache [GB / sec]",
>      > -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 /
>     duration_time",
>      > -        "MetricGroup": "Mem;MemoryBW",
>      > -        "MetricName": "L1D_Cache_Fill_BW"
>      > +        "BriefDescription": "Branch Misprediction Cost: Fraction
>     of TMA slots wasted per non-speculative branch misprediction
>     (retired JEClear)",
>      > +        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( (
>     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )
>     )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (
>     ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>     * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( (
>     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>     ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))
>     / BR_MISP_RETIRED.ALL_BRANCHES",
>      > +        "MetricGroup": "Bad;BrMispredicts_SMT",
>      > +        "MetricName": "Branch_Misprediction_Cost_SMT"
>      >       },
>      >       {
>      > -        "BriefDescription": "Average data fill bandwidth to the
>     L2 cache [GB / sec]",
>      > -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 /
>     duration_time",
>      > -        "MetricGroup": "Mem;MemoryBW",
>      > -        "MetricName": "L2_Cache_Fill_BW"
>      > +        "BriefDescription": "Actual Average Latency for L1
>     data-cache miss demand load operations (in core cycles)",
>      > +        "MetricExpr": "L1D_PEND_MISS.PENDING / (
>     MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>      > +        "MetricGroup": "Mem;MemoryBound;MemoryLat",
>      > +        "MetricName": "Load_Miss_Real_Latency"
>      >       },
>      >       {
>      > -        "BriefDescription": "Average per-core data fill
>     bandwidth to the L3 cache [GB / sec]",
>      > -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000
>     / duration_time",
>      > -        "MetricGroup": "Mem;MemoryBW",
>      > -        "MetricName": "L3_Cache_Fill_BW"
>      > +        "BriefDescription": "Memory-Level-Parallelism (average
>     number of L1 miss demand load when there is at least one such miss.
>     Per-Logical Processor)",
>      > +        "MetricExpr": "L1D_PEND_MISS.PENDING /
>     L1D_PEND_MISS.PENDING_CYCLES",
>      > +        "MetricGroup": "Mem;MemoryBound;MemoryBW",
>      > +        "MetricName": "MLP"
>      >       },
>      >       {
>      >           "BriefDescription": "L1 cache true misses per kilo
>     instruction for retired demand loads",
>      > @@ -306,13 +293,13 @@
>      >           "MetricName": "L2MPKI"
>      >       },
>      >       {
>      > -        "BriefDescription": "L2 cache misses per kilo
>     instruction for all request types (including speculative)",
>      > +        "BriefDescription": "L2 cache ([RKL+] true) misses per
>     kilo instruction for all request types (including speculative)",
>      >           "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>      >           "MetricGroup": "Mem;CacheMisses;Offcore",
>      >           "MetricName": "L2MPKI_All"
>      >       },
>      >       {
>      > -        "BriefDescription": "L2 cache misses per kilo
>     instruction for all demand loads  (including speculative)",
>      > +        "BriefDescription": "L2 cache ([RKL+] true) misses per
>     kilo instruction for all demand loads  (including speculative)",
>      >           "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS /
>     INST_RETIRED.ANY",
>      >           "MetricGroup": "Mem;CacheMisses",
>      >           "MetricName": "L2MPKI_Load"
>      > @@ -348,6 +335,48 @@
>      >           "MetricGroup": "Mem;MemoryTLB_SMT",
>      >           "MetricName": "Page_Walks_Utilization_SMT"
>      >       },
>      > +    {
>      > +        "BriefDescription": "Average per-core data fill
>     bandwidth to the L1 data cache [GB / sec]",
>      > +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 /
>     duration_time",
>      > +        "MetricGroup": "Mem;MemoryBW",
>      > +        "MetricName": "L1D_Cache_Fill_BW"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average per-core data fill
>     bandwidth to the L2 cache [GB / sec]",
>      > +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 /
>     duration_time",
>      > +        "MetricGroup": "Mem;MemoryBW",
>      > +        "MetricName": "L2_Cache_Fill_BW"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average per-core data fill
>     bandwidth to the L3 cache [GB / sec]",
>      > +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000
>     / duration_time",
>      > +        "MetricGroup": "Mem;MemoryBW",
>      > +        "MetricName": "L3_Cache_Fill_BW"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average per-thread data fill
>     bandwidth to the L1 data cache [GB / sec]",
>      > +        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 /
>     duration_time)",
>      > +        "MetricGroup": "Mem;MemoryBW",
>      > +        "MetricName": "L1D_Cache_Fill_BW_1T"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average per-thread data fill
>     bandwidth to the L2 cache [GB / sec]",
>      > +        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 /
>     duration_time)",
>      > +        "MetricGroup": "Mem;MemoryBW",
>      > +        "MetricName": "L2_Cache_Fill_BW_1T"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average per-thread data fill
>     bandwidth to the L3 cache [GB / sec]",
>      > +        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000
>     / duration_time)",
>      > +        "MetricGroup": "Mem;MemoryBW",
>      > +        "MetricName": "L3_Cache_Fill_BW_1T"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average per-thread data access
>     bandwidth to the L3 cache [GB / sec]",
>      > +        "MetricExpr": "0",
>      > +        "MetricGroup": "Mem;MemoryBW;Offcore",
>      > +        "MetricName": "L3_Cache_Access_BW_1T"
>      > +    },
>      >       {
>      >           "BriefDescription": "Average CPU Utilization",
>      >           "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>      > @@ -364,7 +393,8 @@
>      >           "BriefDescription": "Giga Floating Point Operations Per
>     Second",
>      >           "MetricExpr": "( ( 1 * (
>     FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 *
>     FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * (
>     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 *
>     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / 1000000000 ) /
>     duration_time",
>      >           "MetricGroup": "Cor;Flops;HPC",
>      > -        "MetricName": "GFLOPs"
>      > +        "MetricName": "GFLOPs",
>      > +        "PublicDescription": "Giga Floating Point Operations Per
>     Second. Aggregate across all supported options of: FP precisions,
>     scalar and vector instructions, vector-width and AMX engine."
>      >       },
>      >       {
>      >           "BriefDescription": "Average Frequency Utilization
>     relative nominal frequency",
>      > @@ -461,5 +491,439 @@
>      >           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@)
>     * 100",
>      >           "MetricGroup": "Power",
>      >           "MetricName": "C7_Pkg_Residency"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "CPU operating frequency (in GHz)",
>      > +        "MetricExpr": "( CPU_CLK_UNHALTED.THREAD /
>     CPU_CLK_UNHALTED.REF_TSC * #SYSTEM_TSC_FREQ ) / 1000000000",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "cpu_operating_frequency",
>      > +        "ScaleUnit": "1GHz"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Cycles per instruction retired;
>     indicating how much time each executed instruction took; in units of
>     cycles.",
>      > +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "cpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "The ratio of number of completed
>     memory load instructions to the total number completed instructions",
>      > +        "MetricExpr": "MEM_UOPS_RETIRED.ALL_LOADS /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "loads_per_instr",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "The ratio of number of completed
>     memory store instructions to the total number completed instructions",
>      > +        "MetricExpr": "MEM_UOPS_RETIRED.ALL_STORES /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "stores_per_instr",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of requests missing
>     L1 data cache (includes data+rfo w/ prefetches) to the total number
>     of completed instructions",
>      > +        "MetricExpr": "L1D.REPLACEMENT / INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "l1d_mpi_includes_data_plus_rfo_with_prefetches",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of demand load
>     requests hitting in L1 data cache to the total number of completed
>     instructions",
>      > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L1_HIT /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "l1d_demand_data_read_hits_per_instr",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of code read
>     requests missing in L1 instruction cache (includes prefetches) to
>     the total number of completed instructions",
>      > +        "MetricExpr": "L2_RQSTS.ALL_CODE_RD / INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "l1_i_code_read_misses_with_prefetches_per_instr",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of completed demand
>     load requests hitting in L2 cache to the total number of completed
>     instructions",
>      > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L2_HIT /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "l2_demand_data_read_hits_per_instr",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of requests missing
>     L2 cache (includes code+data+rfo w/ prefetches) to the total number
>     of completed instructions",
>      > +        "MetricExpr": "L2_LINES_IN.ALL / INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "l2_mpi_includes_code_plus_data_plus_rfo_with_prefetches",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of completed data
>     read request missing L2 cache to the total number of completed
>     instructions",
>      > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L2_MISS /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "l2_demand_data_read_mpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of code read
>     request missing L2 cache to the total number of completed instructions",
>      > +        "MetricExpr": "L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "l2_demand_code_mpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of data read
>     requests missing last level core cache (includes demand w/
>     prefetches) to the total number of completed instructions",
>      > +        "MetricExpr": "(
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x192@ ) /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "llc_data_read_mpi_demand_plus_prefetch",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of code read
>     requests missing last level core cache (includes demand w/
>     prefetches) to the total number of completed instructions",
>      > +        "MetricExpr": "(
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x181@ +
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x191@ ) /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "llc_code_read_mpi_demand_plus_prefetch",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average latency of a last level
>     cache (LLC) demand and prefetch data read miss (read memory access)
>     in nano seconds",
>      > +        "MetricExpr": "( 1000000000 * (
>     cbox@UNC_C_TOR_OCCUPANCY.MISS_OPCODE\\,filter_opc\\=0x182@ /
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
>     UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
>     ) ) ) * duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "llc_data_read_demand_plus_prefetch_miss_latency",
>      > +        "ScaleUnit": "1ns"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average latency of a last level
>     cache (LLC) demand and prefetch data read miss (read memory access)
>     addressed to local memory in nano seconds",
>      > +        "MetricExpr": "( 1000000000 * (
>     cbox@UNC_C_TOR_OCCUPANCY.MISS_LOCAL_OPCODE\\,filter_opc\\=0x182@ /
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
>     UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
>     ) ) ) * duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "llc_data_read_demand_plus_prefetch_miss_latency_for_local_requests",
>      > +        "ScaleUnit": "1ns"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Average latency of a last level
>     cache (LLC) demand and prefetch data read miss (read memory access)
>     addressed to remote memory in nano seconds",
>      > +        "MetricExpr": "( 1000000000 * (
>     cbox@UNC_C_TOR_OCCUPANCY.MISS_REMOTE_OPCODE\\,filter_opc\\=0x182@ /
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
>     UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
>     ) ) ) * duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests",
>      > +        "ScaleUnit": "1ns"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of completed page
>     walks (for all page sizes) caused by a code fetch to the total
>     number of completed instructions. This implies it missed in the ITLB
>     (Instruction TLB) and further levels of TLB.",
>      > +        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "itlb_mpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of completed page
>     walks (for 2 megabyte and 4 megabyte page sizes) caused by a code
>     fetch to the total number of completed instructions. This implies it
>     missed in the Instruction Translation Lookaside Buffer (ITLB) and
>     further levels of TLB.",
>      > +        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED_2M_4M /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "itlb_large_page_mpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of completed page
>     walks (for all page sizes) caused by demand data loads to the total
>     number of completed instructions. This implies it missed in the DTLB
>     and further levels of TLB.",
>      > +        "MetricExpr": "DTLB_LOAD_MISSES.WALK_COMPLETED /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "dtlb_load_mpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Ratio of number of completed page
>     walks (for all page sizes) caused by demand data stores to the total
>     number of completed instructions. This implies it missed in the DTLB
>     and further levels of TLB.",
>      > +        "MetricExpr": "DTLB_STORE_MISSES.WALK_COMPLETED /
>     INST_RETIRED.ANY",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "dtlb_store_mpi",
>      > +        "ScaleUnit": "1per_instr"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Memory read that miss the last
>     level cache (LLC) addressed to local DRAM as a percentage of total
>     memory read accesses, does not include LLC prefetches.",
>      > +        "MetricExpr": "100 *
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ / (
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ )",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "numa_percent_reads_addressed_to_local_dram",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Memory reads that miss the last
>     level cache (LLC) addressed to remote DRAM as a percentage of total
>     memory read accesses, does not include LLC prefetches.",
>      > +        "MetricExpr": "100 *
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ / (
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
>     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ )",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "numa_percent_reads_addressed_to_remote_dram",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Uncore operating frequency in GHz",
>      > +        "MetricExpr": "UNC_C_CLOCKTICKS / (
>     source_count(UNC_C_CLOCKTICKS) * #num_packages ) / 1000000000",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "uncore_frequency",
>      > +        "ScaleUnit": "1GHz"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Intel(R) Quick Path Interconnect
>     (QPI) data transmit bandwidth (MB/sec)",
>      > +        "MetricExpr": "( UNC_Q_TxL_FLITS_G0.DATA * 8 / 1000000)
>     / duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "qpi_data_transmit_bw_only_data",
>      > +        "ScaleUnit": "1MB/s"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "DDR memory read bandwidth (MB/sec)",
>      > +        "MetricExpr": "( UNC_M_CAS_COUNT.RD * 64 / 1000000) /
>     duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "memory_bandwidth_read",
>      > +        "ScaleUnit": "1MB/s"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "DDR memory write bandwidth (MB/sec)",
>      > +        "MetricExpr": "( UNC_M_CAS_COUNT.WR * 64 / 1000000) /
>     duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "memory_bandwidth_write",
>      > +        "ScaleUnit": "1MB/s"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "DDR memory bandwidth (MB/sec)",
>      > +        "MetricExpr": "(( UNC_M_CAS_COUNT.RD +
>     UNC_M_CAS_COUNT.WR ) * 64 / 1000000) / duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "memory_bandwidth_total",
>      > +        "ScaleUnit": "1MB/s"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Bandwidth of IO reads that are
>     initiated by end device controllers that are requesting memory from
>     the CPU.",
>      > +        "MetricExpr": "(
>     cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x19e@ * 64 / 1000000)
>     / duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "io_bandwidth_read",
>      > +        "ScaleUnit": "1MB/s"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Bandwidth of IO writes that are
>     initiated by end device controllers that are writing memory to the
>     CPU.",
>      > +        "MetricExpr": "((
>     cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x1c8\\,filter_tid\\=0x3e@
>     +
>     cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x180\\,filter_tid\\=0x3e@
>     ) * 64 / 1000000) / duration_time",
>      > +        "MetricGroup": "",
>      > +        "MetricName": "io_bandwidth_write",
>      > +        "ScaleUnit": "1MB/s"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Uops delivered from decoded
>     instruction cache (decoded stream buffer or DSB) as a percent of
>     total uops delivered to Instruction Decode Queue",
>      > +        "MetricExpr": "100 * ( IDQ.DSB_UOPS / UOPS_ISSUED.ANY )",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "percent_uops_delivered_from_decoded_icache_dsb",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Uops delivered from legacy decode
>     pipeline (Micro-instruction Translation Engine or MITE) as a percent
>     of total uops delivered to Instruction Decode Queue",
>      > +        "MetricExpr": "100 * ( IDQ.MITE_UOPS / UOPS_ISSUED.ANY )",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "percent_uops_delivered_from_legacy_decode_pipeline_mite",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Uops delivered from microcode
>     sequencer (MS) as a percent of total uops delivered to Instruction
>     Decode Queue",
>      > +        "MetricExpr": "100 * ( IDQ.MS_UOPS / UOPS_ISSUED.ANY )",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "percent_uops_delivered_from_microcode_sequencer_ms",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "Uops delivered from loop stream
>     detector(LSD) as a percent of total uops delivered to Instruction
>     Decode Queue",
>      > +        "MetricExpr": "100 * ( LSD.UOPS / UOPS_ISSUED.ANY )",
>      > +        "MetricGroup": "",
>      > +        "MetricName":
>     "percent_uops_delivered_from_loop_stream_detector_lsd",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This category represents fraction
>     of slots where the processor's Frontend undersupplies its Backend.
>     Frontend denotes the first part of the processor core responsible to
>     fetch operations that are executed later on by the Backend part.
>     Within the Frontend; a branch predictor predicts the next address to
>     fetch; cache-lines are fetched from the memory subsystem; parsed
>     into instructions; and lastly decoded into micro-operations (uops).
>     Ideally the Frontend can issue Machine_Width uops every cycle to the
>     Backend. Frontend Bound denotes unutilized issue-slots when there is
>     no Backend stall; i.e. bubbles where Frontend delivered no uops
>     while Backend could have accepted them. For example; stalls due to
>     instruction-cache misses would be categorized under Frontend Bound.",
>      > +        "MetricExpr": "100 * ( IDQ_UOPS_NOT_DELIVERED.CORE / ( (
>     4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) )",
>      > +        "MetricGroup": "TmaL1;PGO",
>      > +        "MetricName": "tma_frontend_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots the CPU was stalled due to Frontend latency issues.  For
>     example; instruction-cache misses; iTLB misses or fetch stalls after
>     a branch misprediction are categorized under Frontend Latency. In
>     such cases; the Frontend eventually delivers no uops for some period.",
>      > +        "MetricExpr": "100 * ( ( 4 ) *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) )",
>      > +        "MetricGroup":
>     "Frontend;TmaL2;m_tma_frontend_bound_percent",
>      > +        "MetricName": "tma_fetch_latency_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     cycles the CPU was stalled due to instruction cache misses.",
>      > +        "MetricExpr": "100 * ( ICACHE.IFDATA_STALL / (
>     CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "BigFoot;FetchLat;IcMiss;TmaL3;m_tma_fetch_latency_percent",
>      > +        "MetricName": "tma_icache_misses_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     cycles the CPU was stalled due to Instruction TLB (ITLB) misses.",
>      > +        "MetricExpr": "100 * ( ( 14 * ITLB_MISSES.STLB_HIT +
>     cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=0x1@ + 7 *
>     ITLB_MISSES.WALK_COMPLETED ) / ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "BigFoot;FetchLat;MemoryTLB;TmaL3;m_tma_fetch_latency_percent",
>      > +        "MetricName": "tma_itlb_misses_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     cycles the CPU was stalled due to Branch Resteers. Branch Resteers
>     estimates the Frontend delay in fetching operations from corrected
>     path; following all sorts of miss-predicted branches. For example;
>     branchy code with lots of miss-predictions might get categorized
>     under Branch Resteers. Note the value of this node may overlap with
>     its siblings.",
>      > +        "MetricExpr": "100 * ( ( 12 ) * (
>     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>     / ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
>      > +        "MetricName": "tma_branch_resteers_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     cycles the CPU was stalled due to switches from DSB to MITE
>     pipelines. The DSB (decoded i-cache) is a Uop Cache where the
>     front-end directly delivers Uops (micro operations) avoiding heavy
>     x86 decoding. The DSB pipeline has shorter latency and delivered
>     higher bandwidth than the MITE (legacy instruction decode pipeline).
>     Switching between the two pipelines can cause penalties hence this
>     metric measures the exposed penalty.",
>      > +        "MetricExpr": "100 * ( DSB2MITE_SWITCHES.PENALTY_CYCLES
>     / ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "DSBmiss;FetchLat;TmaL3;m_tma_fetch_latency_percent",
>      > +        "MetricName": "tma_dsb_switches_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using
>     proper compiler flags or Intel Compiler by default will certainly
>     avoid this. #Link: Optimization Guide about LCP BKMs.",
>      > +        "MetricExpr": "100 * ( ILD_STALL.LCP / (
>     CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
>      > +        "MetricName": "tma_lcp_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates the fraction
>     of cycles when the CPU was stalled due to switches of uop delivery
>     to the Microcode Sequencer (MS). Commonly used instructions are
>     optimized for delivery by the DSB (decoded i-cache) or MITE (legacy
>     instruction decode) pipelines. Certain operations cannot be handled
>     natively by the execution pipeline; and must be performed by
>     microcode (small programs injected into the execution stream).
>     Switching to the MS too often can negatively impact performance. The
>     MS is designated to deliver long uop flows required by CISC
>     instructions like CPUID; or uncommon conditions like Floating Point
>     Assists when dealing with Denormals.",
>      > +        "MetricExpr": "100 * ( ( 2 ) * IDQ.MS_SWITCHES / (
>     CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "FetchLat;MicroSeq;TmaL3;m_tma_fetch_latency_percent",
>      > +        "MetricName": "tma_ms_switches_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots the CPU was stalled due to Frontend bandwidth issues.  For
>     example; inefficiencies at the instruction decoders; or restrictions
>     for caching in the DSB (decoded uops cache) are categorized under
>     Fetch Bandwidth. In such cases; the Frontend typically delivers
>     suboptimal amount of uops to the Backend.",
>      > +        "MetricExpr": "100 * ( ( IDQ_UOPS_NOT_DELIVERED.CORE / (
>     ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( 4 ) *
>     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
>      > +        "MetricGroup":
>     "FetchBW;Frontend;TmaL2;m_tma_frontend_bound_percent",
>      > +        "MetricName": "tma_fetch_bandwidth_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents Core
>     fraction of cycles in which CPU was likely limited due to the MITE
>     pipeline (the legacy decode pipeline). This pipeline is used for
>     code that was not pre-cached in the DSB or LSD. For example;
>     inefficiencies due to asymmetric decoders; use of long immediate or
>     LCP can manifest as MITE fetch bandwidth bottleneck.",
>      > +        "MetricExpr": "100 * ( ( IDQ.ALL_MITE_CYCLES_ANY_UOPS -
>     IDQ.ALL_MITE_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 )
>     if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) / 2 )",
>      > +        "MetricGroup":
>     "DSBmiss;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
>      > +        "MetricName": "tma_mite_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents Core
>     fraction of cycles in which CPU was likely limited due to DSB
>     (decoded uop cache) fetch pipeline.  For example; inefficient
>     utilization of the DSB cache structure or bank conflict when reading
>     from it; are categorized here.",
>      > +        "MetricExpr": "100 * ( ( IDQ.ALL_DSB_CYCLES_ANY_UOPS -
>     IDQ.ALL_DSB_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 )
>     if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) / 2 )",
>      > +        "MetricGroup":
>     "DSB;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
>      > +        "MetricName": "tma_dsb_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This category represents fraction
>     of slots wasted due to incorrect speculations. This include slots
>     used to issue uops that do not eventually get retired and slots for
>     which the issue-pipeline was blocked due to recovery from earlier
>     incorrect speculation. For example; wasted work due to
>     miss-predicted branches are categorized under Bad Speculation
>     category. Incorrect data speculation followed by Memory Ordering
>     Nukes is another example.",
>      > +        "MetricExpr": "100 * ( ( UOPS_ISSUED.ANY - (
>     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) )",
>      > +        "MetricGroup": "TmaL1",
>      > +        "MetricName": "tma_bad_speculation_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots the CPU has wasted due to Branch Misprediction.  These slots
>     are either wasted by uops fetched from an incorrectly speculated
>     program path; or stalls when the out-of-order part of the machine
>     needs to recover its state from a speculative path.",
>      > +        "MetricExpr": "100 * ( ( BR_MISP_RETIRED.ALL_BRANCHES /
>     ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( (
>     UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
>      > +        "MetricGroup":
>     "BadSpec;BrMispredicts;TmaL2;m_tma_bad_speculation_percent",
>      > +        "MetricName": "tma_branch_mispredicts_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots the CPU has wasted due to Machine Clears.  These slots are
>     either wasted by uops fetched prior to the clear; or stalls the
>     out-of-order portion of the machine needs to recover its state after
>     the clear. For example; this can happen due to memory ordering Nukes
>     (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes.",
>      > +        "MetricExpr": "100 * ( ( ( UOPS_ISSUED.ANY - (
>     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( BR_MISP_RETIRED.ALL_BRANCHES /
>     ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( (
>     UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
>      > +        "MetricGroup":
>     "BadSpec;MachineClears;TmaL2;m_tma_bad_speculation_percent",
>      > +        "MetricName": "tma_machine_clears_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This category represents fraction
>     of slots where no uops are being delivered due to a lack of required
>     resources for accepting new uops in the Backend. Backend is the
>     portion of the processor core where the out-of-order scheduler
>     dispatches ready uops into their respective execution units; and
>     once completed these uops get retired according to program order.
>     For example; stalls due to data-cache misses or stalls due to the
>     divider unit being overloaded are both categorized under Backend
>     Bound. Backend Bound is further divided into two main categories:
>     Memory Bound and Core Bound.",
>      > +        "MetricExpr": "100 * ( 1 - ( (
>     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
>      > +        "MetricGroup": "TmaL1",
>      > +        "MetricName": "tma_backend_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots the Memory subsystem within the Backend was a bottleneck. 
>     Memory Bound estimates fraction of slots where pipeline is likely
>     stalled due to demand load or store instructions. This accounts
>     mainly for (1) non-completed in-flight memory demand loads which
>     coincides with execution units starvation; in addition to (2) cases
>     where stores could impose backpressure on the pipeline when many of
>     them get buffered at the same time (less common out of the two).",
>      > +        "MetricExpr": "100 * ( ( ( CYCLE_ACTIVITY.STALLS_MEM_ANY
>     + RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) / ( (
>     CYCLE_ACTIVITY.STALLS_TOTAL + UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
>     UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
>     CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
>     UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
>     ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
>     ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
>     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) ) * ( 1 - ( (
>     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) )",
>      > +        "MetricGroup": "Backend;TmaL2;m_tma_backend_bound_percent",
>      > +        "MetricName": "tma_memory_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates how often the
>     CPU was stalled without loads missing the L1 data cache.  The L1
>     data cache typically has the shortest latency.  However; in certain
>     cases like loads blocked on older stores; a load might suffer due to
>     high latency even though it is being satisfied by the L1. Another
>     example is loads who miss in the TLB. These cases are characterized
>     by execution unit stalls; while some non-completed demand load lives
>     in the machine without having that demand load missing the L1 cache.",
>      > +        "MetricExpr": "100 * ( max( (
>     CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / (
>     CPU_CLK_UNHALTED.THREAD ) , 0 ) )",
>      > +        "MetricGroup":
>     "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>      > +        "MetricName": "tma_l1_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates how often the
>     CPU was stalled due to L2 cache accesses by loads.  Avoiding cache
>     misses (i.e. L1 misses/L2 hits) can improve the latency and increase
>     performance.",
>      > +        "MetricExpr": "100 * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS
>     - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>      > +        "MetricName": "tma_l2_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates how often the
>     CPU was stalled due to loads accesses to L3 cache or contended with
>     a sibling Core.  Avoiding cache misses (i.e. L2 misses/L3 hits) can
>     improve the latency and increase performance.",
>      > +        "MetricExpr": "100 * ( ( MEM_LOAD_UOPS_RETIRED.L3_HIT /
>     ( MEM_LOAD_UOPS_RETIRED.L3_HIT + ( 7 ) *
>     MEM_LOAD_UOPS_RETIRED.L3_MISS ) ) * CYCLE_ACTIVITY.STALLS_L2_MISS /
>     ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>      > +        "MetricName": "tma_l3_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates how often the
>     CPU was stalled on accesses to external memory (DRAM) by loads.
>     Better caching can improve the latency and increase performance.",
>      > +        "MetricExpr": "100 * ( min( ( ( 1 - (
>     MEM_LOAD_UOPS_RETIRED.L3_HIT / ( MEM_LOAD_UOPS_RETIRED.L3_HIT + ( 7
>     ) * MEM_LOAD_UOPS_RETIRED.L3_MISS ) ) ) *
>     CYCLE_ACTIVITY.STALLS_L2_MISS / ( CPU_CLK_UNHALTED.THREAD ) ) , ( 1
>     ) ) )",
>      > +        "MetricGroup":
>     "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>      > +        "MetricName": "tma_dram_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates how often CPU
>     was stalled  due to RFO store memory accesses; RFO store issue a
>     read-for-ownership request before the write. Even though store
>     accesses do not typically stall out-of-order CPUs; there are few
>     cases where stores can lead to actual stalls. This metric will be
>     flagged should RFO stores be a bottleneck.",
>      > +        "MetricExpr": "100 * ( RESOURCE_STALLS.SB
>     <http://RESOURCE_STALLS.SB> / ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup":
>     "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>      > +        "MetricName": "tma_store_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots where Core non-memory issues were of a bottleneck.  Shortage
>     in hardware compute resources; or dependencies in software's
>     instructions are both categorized under Core Bound. Hence it may
>     indicate the machine ran out of an out-of-order resource; certain
>     execution units are overloaded or dependencies in program's data- or
>     instruction-flow are limiting the performance (e.g. FP-chained
>     long-latency arithmetic operations).",
>      > +        "MetricExpr": "100 * ( ( 1 - ( (
>     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) - ( ( (
>     CYCLE_ACTIVITY.STALLS_MEM_ANY + RESOURCE_STALLS.SB
>     <http://RESOURCE_STALLS.SB> ) / ( ( CYCLE_ACTIVITY.STALLS_TOTAL +
>     UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
>     UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
>     CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
>     UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
>     ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
>     ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
>     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) ) * ( 1 - ( (
>     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) ) )",
>      > +        "MetricGroup":
>     "Backend;TmaL2;Compute;m_tma_backend_bound_percent",
>      > +        "MetricName": "tma_core_bound_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     cycles where the Divider unit was active. Divide and square root
>     instructions are performed by the Divider unit and can take
>     considerably longer latency than integer or Floating Point addition;
>     subtraction; or multiplication.",
>      > +        "MetricExpr": "100 * ( ARITH.FPU_DIV_ACTIVE / ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) )",
>      > +        "MetricGroup": "TmaL3;m_tma_core_bound_percent",
>      > +        "MetricName": "tma_divider_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric estimates fraction of
>     cycles the CPU performance was potentially limited due to Core
>     computation issues (non divider-related).  Two distinct categories
>     can be attributed into this metric: (1) heavy data-dependency among
>     contiguous instructions would manifest in this metric - such cases
>     are often referred to as low Instruction Level Parallelism (ILP).
>     (2) Contention on some hardware execution unit other than Divider.
>     For example; when there are too many multiply operations.",
>      > +        "MetricExpr": "100 * ( ( ( ( CYCLE_ACTIVITY.STALLS_TOTAL
>     + UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
>     UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
>     CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
>     UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
>     ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
>     ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
>     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) -
>     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> -
>     CYCLE_ACTIVITY.STALLS_MEM_ANY ) / ( CPU_CLK_UNHALTED.THREAD ) )",
>      > +        "MetricGroup": "PortsUtil;TmaL3;m_tma_core_bound_percent",
>      > +        "MetricName": "tma_ports_utilization_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This category represents fraction
>     of slots utilized by useful work i.e. issued uops that eventually
>     get retired. Ideally; all pipeline slots would be attributed to the
>     Retiring category.  Retiring of 100% would indicate the maximum
>     Pipeline_Width throughput was achieved.  Maximizing Retiring
>     typically increases the Instructions-per-cycle (see IPC metric).
>     Note that a high Retiring value does not necessary mean there is no
>     room for more performance.  For example; Heavy-operations or
>     Microcode Assists are categorized under Retiring. They often
>     indicate suboptimal performance and can often be optimized or
>     avoided. ",
>      > +        "MetricExpr": "100 * ( ( UOPS_RETIRED.RETIRE_SLOTS ) / (
>     ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) )",
>      > +        "MetricGroup": "TmaL1",
>      > +        "MetricName": "tma_retiring_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots where the CPU was retiring light-weight operations --
>     instructions that require no more than one uop (micro-operation).
>     This correlates with total number of instructions used by the
>     program. A uops-per-instruction (see UPI metric) ratio of 1 or less
>     should be expected for decently optimized software running on Intel
>     Core/Xeon products. While this often indicates efficient X86
>     instructions were executed; high value does not necessarily mean
>     better performance cannot be achieved.",
>      > +        "MetricExpr": "100 * ( ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( ( ( UOPS_RETIRED.RETIRE_SLOTS
>     ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
>      > +        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
>      > +        "MetricName": "tma_light_operations_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents overall
>     arithmetic floating-point (FP) operations fraction the CPU has
>     executed (retired). Note this metric's value may exceed its parent
>     due to use of \"Uops\" CountDomain and FMA double-counting.",
>      > +        "MetricExpr": "100 * ( ( INST_RETIRED.X87 * ( (
>     UOPS_RETIRED.RETIRE_SLOTS ) / INST_RETIRED.ANY ) / (
>     UOPS_RETIRED.RETIRE_SLOTS ) ) + ( (
>     FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) / ( UOPS_RETIRED.RETIRE_SLOTS
>     ) ) + ( min( ( ( FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
>     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
>     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / (
>     UOPS_RETIRED.RETIRE_SLOTS ) ) , ( 1 ) ) ) )",
>      > +        "MetricGroup": "HPC;TmaL3;m_tma_light_operations_percent",
>      > +        "MetricName": "tma_fp_arith_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots where the CPU was retiring heavy-weight operations --
>     instructions that require two or more uops or microcoded sequences.
>     This highly-correlates with the uop length of these
>     instructions/sequences.",
>      > +        "MetricExpr": "100 * ( ( ( ( UOPS_RETIRED.RETIRE_SLOTS )
>     / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
>      > +        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
>      > +        "MetricName": "tma_heavy_operations_percent",
>      > +        "ScaleUnit": "1%"
>      > +    },
>      > +    {
>      > +        "BriefDescription": "This metric represents fraction of
>     slots the CPU was retiring uops fetched by the Microcode Sequencer
>     (MS) unit.  The MS is used for CISC instructions not supported by
>     the default decoders (like repeat move strings; or CPUID); or by
>     microcode assists used to address some operation modes (like in
>     Floating Point assists). These cases can often be avoided.",
>      > +        "MetricExpr": "100 * ( ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>     UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
>     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>     CPU_CLK_UNHALTED.THREAD ) ) ) )",
>      > +        "MetricGroup":
>     "MicroSeq;TmaL3;m_tma_heavy_operations_percent",
>      > +        "MetricName": "tma_microcode_sequencer_percent",
>      > +        "ScaleUnit": "1%"
>      >       }
>      >   ]
>      > diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>     b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>      > index 127abe08362f..2efc4c0ee740 100644
>      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>      > @@ -814,9 +814,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_CODE_RD.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x04003C0244",
>      > +        "MSRValue": "0x4003C0244",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch code
>     reads hit in the L3 and the snoops to sibling cores hit in either
>     E/S state and the line is not forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -829,7 +828,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x10003C0091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads hit in the L3 and the snoop to one of the sibling cores hits
>     the line in M state and the line is forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -840,9 +838,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x04003C0091",
>      > +        "MSRValue": "0x4003C0091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads hit in the L3 and the snoops to sibling cores hit in either
>     E/S state and the line is not forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -855,7 +852,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x10003C07F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) hit in the L3 and the snoop to one of the
>     sibling cores hits the line in M state and the line is forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -866,9 +862,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_READS.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x04003C07F7",
>      > +        "MSRValue": "0x4003C07F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) hit in the L3 and the snoops to sibling cores
>     hit in either E/S state and the line is not forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -881,7 +876,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3F803C8FFF",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all requests hit in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -894,7 +888,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x10003C0122",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch RFOs
>     hit in the L3 and the snoop to one of the sibling cores hits the
>     line in M state and the line is forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -905,9 +898,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x04003C0122",
>      > +        "MSRValue": "0x4003C0122",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch RFOs
>     hit in the L3 and the snoops to sibling cores hit in either E/S
>     state and the line is not forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -920,7 +912,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3F803C0002",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand data writes
>     (RFOs) hit in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -933,7 +924,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x10003C0002",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand data writes
>     (RFOs) hit in the L3 and the snoop to one of the sibling cores hits
>     the line in M state and the line is forwarded",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -946,7 +936,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3F803C0200",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts prefetch (that bring data
>     to LLC only) code reads hit in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -959,7 +948,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3F803C0100",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all prefetch (that bring
>     data to LLC only) RFOs hit in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -973,4 +961,4 @@
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x10"
>      >       }
>      > -]
>      > \ No newline at end of file
>      > +]
>      > diff --git
>     a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>     b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>      > index 9ad37dddb354..93bbc8600321 100644
>      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>      > @@ -5,6 +5,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     128-bit packed double precision floating-point instructions retired;
>     some instructions will count twice as noted below.  Each count
>     represents 2 computation operations, one for each element.  Applies
>     to SSE* and AVX* packed double precision floating-point
>     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP
>     FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they
>     perform 2 calculations per element. The DAZ and FTZ flags in the
>     MXCSR register need to be set when using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x4"
>      >       },
>      > @@ -14,6 +15,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     128-bit packed single precision floating-point instructions retired;
>     some instructions will count twice as noted below.  Each count
>     represents 4 computation operations, one for each element.  Applies
>     to SSE* and AVX* packed single precision floating-point
>     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT
>     RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice
>     as they perform 2 calculations per element. The DAZ and FTZ flags in
>     the MXCSR register need to be set when using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x8"
>      >       },
>      > @@ -23,6 +25,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     256-bit packed double precision floating-point instructions retired;
>     some instructions will count twice as noted below.  Each count
>     represents 4 computation operations, one for each element.  Applies
>     to SSE* and AVX* packed double precision floating-point
>     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT
>     FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform
>     2 calculations per element. The DAZ and FTZ flags in the MXCSR
>     register need to be set when using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x10"
>      >       },
>      > @@ -32,6 +35,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     256-bit packed single precision floating-point instructions retired;
>     some instructions will count twice as noted below.  Each count
>     represents 8 computation operations, one for each element.  Applies
>     to SSE* and AVX* packed single precision floating-point
>     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT
>     RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice
>     as they perform 2 calculations per element. The DAZ and FTZ flags in
>     the MXCSR register need to be set when using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x20"
>      >       },
>      > @@ -59,6 +63,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     scalar single precision and double precision floating-point
>     instructions retired; some instructions will count twice as noted
>     below.  Each count represents 1 computational operation. Applies to
>     SSE* and AVX* scalar single precision floating-point instructions:
>     ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB
>     instructions count twice as they perform 2 calculations per element.
>     The DAZ and FTZ flags in the MXCSR register need to be set when
>     using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x3"
>      >       },
>      > @@ -68,6 +73,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     scalar double precision floating-point instructions retired; some
>     instructions will count twice as noted below.  Each count represents
>     1 computational operation. Applies to SSE* and AVX* scalar double
>     precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT
>     FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform
>     2 calculations per element. The DAZ and FTZ flags in the MXCSR
>     register need to be set when using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -77,6 +83,7 @@
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xc7",
>      >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
>      > +        "PublicDescription": "Number of SSE/AVX computational
>     scalar single precision floating-point instructions retired; some
>     instructions will count twice as noted below.  Each count represents
>     1 computational operation. Applies to SSE* and AVX* scalar single
>     precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT
>     RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as
>     they perform 2 calculations per element. The DAZ and FTZ flags in
>     the MXCSR register need to be set when using these events.",
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x2"
>      >       },
>      > @@ -190,4 +197,4 @@
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x3"
>      >       }
>      > -]
>      > \ No newline at end of file
>      > +]
>      > diff --git
>     a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>     b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>      > index f0bcb945ff76..37ce8034b2ed 100644
>      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>      > @@ -292,4 +292,4 @@
>      >           "SampleAfterValue": "2000003",
>      >           "UMask": "0x1"
>      >       }
>      > -]
>      > \ No newline at end of file
>      > +]
>      > diff --git
>     a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>     b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>      > index cce993b197e3..545f61f691b9 100644
>      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>      > @@ -247,7 +247,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3FBFC00244",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch code
>     reads miss in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -258,9 +257,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_CODE_RD.LLC_MISS.LOCAL_DRAM",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x0604000244",
>      > +        "MSRValue": "0x604000244",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch code
>     reads miss the L3 and the data is returned from local dram",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -273,7 +271,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3FBFC00091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads miss in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -284,9 +281,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.LOCAL_DRAM",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x0604000091",
>      > +        "MSRValue": "0x604000091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads miss the L3 and the data is returned from local dram",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -297,9 +293,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.REMOTE_DRAM",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x063BC00091",
>      > +        "MSRValue": "0x63BC00091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads miss the L3 and the data is returned from remote dram",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -312,7 +307,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x103FC00091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads miss the L3 and the modified data is transferred from remote
>     cache",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -323,9 +317,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.REMOTE_HIT_FORWARD",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x087FC00091",
>      > +        "MSRValue": "0x87FC00091",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch data
>     reads miss the L3 and clean or shared data is transferred from
>     remote cache",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -338,20 +331,18 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3FBFC007F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) miss in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      >       {
>      > -        "BriefDescription": "Counts all data/code/rfo reads
>     (demand & prefetch)miss the L3 and the data is returned from local
>     dram",
>      > +        "BriefDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) miss the L3 and the data is returned from local
>     dram",
>      >           "Counter": "0,1,2,3",
>      >           "CounterHTOff": "0,1,2,3",
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.LOCAL_DRAM",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x06040007F7",
>      > +        "MSRValue": "0x6040007F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch)miss the L3 and the data is returned from local
>     dram",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -362,9 +353,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_DRAM",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x063BC007F7",
>      > +        "MSRValue": "0x63BC007F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) miss the L3 and the data is returned from remote
>     dram",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -377,7 +367,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x103FC007F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) miss the L3 and the modified data is transferred
>     from remote cache",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -388,9 +377,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_HIT_FORWARD",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x087FC007F7",
>      > +        "MSRValue": "0x87FC007F7",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all data/code/rfo reads
>     (demand & prefetch) miss the L3 and clean or shared data is
>     transferred from remote cache",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -403,7 +391,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3FBFC08FFF",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all requests miss in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -416,7 +403,6 @@
>      >           "MSRIndex": "0x1a6,0x1a7",
>      >           "MSRValue": "0x3FBFC00122",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & prefetch RFOs
>     miss in the L3",
>      >           "SampleAfterValue": "100003",
>      >           "UMask": "0x1"
>      >       },
>      > @@ -427,9 +413,8 @@
>      >           "EventCode": "0xB7, 0xBB",
>      >           "EventName":
>     "OFFCORE_RESPONSE.ALL_RFO.LLC_MISS.LOCAL_DRAM",
>      >           "MSRIndex": "0x1a6,0x1a7",
>      > -        "MSRValue": "0x0604000122",
>      > +        "MSRValue": "0x604000122",
>      >           "Offcore": "1",
>      > -        "PublicDescription": "Counts all demand & pref
> 

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 02/31] perf vendor events: Update Intel broadwellx
  2022-07-26  1:25       ` [PATCH v1 02/31] perf vendor events: Update Intel broadwellx Xing Zhengjun
@ 2022-07-26  4:49         ` Ian Rogers
  2022-07-26  5:19           ` Xing Zhengjun
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Rogers @ 2022-07-26  4:49 UTC (permalink / raw)
  To: Xing Zhengjun
  Cc: Arnaldo Carvalho de Melo, Taylor, Perry, Biggers, Caleb,
	Bopardikar, Kshipra, Kan Liang, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Andi Kleen, James Clark,
	John Garry, LKML, linux-perf-users, Sedat Dilek, Stephane Eranian

On Mon, Jul 25, 2022 at 6:25 PM Xing Zhengjun
<zhengjun.xing@linux.intel.com> wrote:
>
>
> HI Arnaldo,
>
> On 7/25/2022 9:06 PM, Ian Rogers wrote:
> >
> >
> > On Sun, Jul 24, 2022, 6:34 PM Xing Zhengjun
> > <zhengjun.xing@linux.intel.com <mailto:zhengjun.xing@linux.intel.com>>
> > wrote:
> >
> >     Hi Ian,
> >
> >     On 7/23/2022 6:32 AM, Ian Rogers wrote:
> >      > Use script at:
> >      >
> >     https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> >     <https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py>
> >      >
> >
> >     It is better to add the event JSON file version and TMA version for the
> >     future track. For example, the event list is based on broadwellx JSON
> >     file v19, the metrics are based on TMA4.4-full.
> >
> >
> > Thanks Xing,
> >
> > I'll add that in v2. I'd skipped it this time as I'd been adding it to
> > the mapfile. I'll also break apart the tremontx to snowridgex change to
> > match yours. I also will rebase to see if that'll fix the git am issue.
> > Apologies in advance to everyone's inboxes :-)
> >
> Hi Arnaldo,
>
> Except for Snowridgex, I also post SPR/ADL/HSX/BDX event list last
> month, Can these be merged in or do I need to update them?  Thanks.
>
> https://lore.kernel.org/all/20220609094222.2030167-1-zhengjun.xing@linux.intel.com/
> https://lore.kernel.org/all/20220609094222.2030167-2-zhengjun.xing@linux.intel.com/
>
> https://lore.kernel.org/all/20220607092749.1976878-1-zhengjun.xing@linux.intel.com/
> https://lore.kernel.org/all/20220607092749.1976878-2-zhengjun.xing@linux.intel.com/
> https://lore.kernel.org/all/20220614145019.2177071-1-zhengjun.xing@linux.intel.com/
> https://lore.kernel.org/all/20220614145019.2177071-2-zhengjun.xing@linux.intel.com/

Thanks Zhengjun,

I think those patches are all stale over what is posted here.
Particular issues are:
 - fixing the files to only contain ascii characters
 - updating the version information in mapfile.csv
 - for BDX and SPR (HSX isn't posted yet, although I've done some
private testing) adding in the metrics from:
https://github.com/intel/perfmon-metrics/tree/main/BDX/metrics/perf
https://github.com/intel/perfmon-metrics/tree/main/SPR/metrics/perf
 - the event data on 01.org was updated. ADL, HSX and SPR were all
updated last Friday prior to my v1 patches.
 - I also tested all the patches on the respective architectures,
hence discovering the fake event parsing fix that is in patch 1.

As stated in the cover letter, the goal is to make it so that what is
in the Linux tree exactly matches the download_and_gen.py output,
however, there were same changes I made that are in this pull request:
https://github.com/intel/event-converter-for-linux-perf/pull/15

Given the size of the new metrics and particularly the previously
missed uncore data, it'd be nice to land the "compression" in:
https://lore.kernel.org/lkml/20220715063653.3203761-1-irogers@google.com/
which prior to this change was reducing the binary size for x86 by
12.5% as well as reducing over a megabytes worth of dirty pages for
relocated data.

Thanks,
Ian

> > Thanks,
> > Ian
> >
> >
> >      > to download and generate the latest events and metrics. Manually copy
> >      > the broadwellx files into perf and update mapfile.csv.
> >      >
> >      > Tested with 'perf test':
> >      >   10: PMU events
> >          :
> >      >   10.1: PMU event table sanity
> >          : Ok
> >      >   10.2: PMU event map aliases
> >           : Ok
> >      >   10.3: Parsing of PMU event table metrics
> >          : Ok
> >      >   10.4: Parsing of PMU event table metrics with fake PMUs
> >           : Ok
> >      >   90: perf all metricgroups test
> >          : Ok
> >      >   91: perf all metrics test
> >           : Skip
> >      >   93: perf all PMU test
> >           : Ok
> >      >
> >      > Signed-off-by: Ian Rogers <irogers@google.com
> >     <mailto:irogers@google.com>>
> >      > ---
> >      >   .../arch/x86/broadwellx/bdx-metrics.json      |  570 ++-
> >      >   .../pmu-events/arch/x86/broadwellx/cache.json |   22 +-
> >      >   .../arch/x86/broadwellx/floating-point.json   |    9 +-
> >      >   .../arch/x86/broadwellx/frontend.json         |    2 +-
> >      >   .../arch/x86/broadwellx/memory.json           |   39 +-
> >      >   .../pmu-events/arch/x86/broadwellx/other.json |    2 +-
> >      >   .../arch/x86/broadwellx/pipeline.json         |    4 +-
> >      >   .../arch/x86/broadwellx/uncore-cache.json     | 3788
> >     ++++++++++++++++-
> >      >   .../x86/broadwellx/uncore-interconnect.json   | 1438 ++++++-
> >      >   .../arch/x86/broadwellx/uncore-memory.json    | 2849 ++++++++++++-
> >      >   .../arch/x86/broadwellx/uncore-other.json     | 3252 ++++++++++++++
> >      >   .../arch/x86/broadwellx/uncore-power.json     |  437 +-
> >      >   .../arch/x86/broadwellx/virtual-memory.json   |    2 +-
> >      >   tools/perf/pmu-events/arch/x86/mapfile.csv    |    2 +-
> >      >   14 files changed, 12103 insertions(+), 313 deletions(-)
> >      >   create mode 100644
> >     tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
> >      >
> >      > diff --git
> >     a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
> >     b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
> >      > index b055947c0afe..720ee7c9332d 100644
> >      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
> >      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
> >      > @@ -74,12 +74,6 @@
> >      >           "MetricGroup": "Branches;Fed;FetchBW",
> >      >           "MetricName": "UpTB"
> >      >       },
> >      > -    {
> >      > -        "BriefDescription": "Cycles Per Instruction (per Logical
> >     Processor)",
> >      > -        "MetricExpr": "1 / (INST_RETIRED.ANY /
> >     CPU_CLK_UNHALTED.THREAD)",
> >      > -        "MetricGroup": "Pipeline;Mem",
> >      > -        "MetricName": "CPI"
> >      > -    },
> >      >       {
> >      >           "BriefDescription": "Per-Logical Processor actual
> >     clocks when the Logical Processor is active.",
> >      >           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
> >      > @@ -130,43 +124,25 @@
> >      >           "MetricName": "FLOPc_SMT"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Actual per-core usage of the
> >     Floating Point execution units (regardless of the vector width)",
> >      > +        "BriefDescription": "Actual per-core usage of the
> >     Floating Point non-X87 execution units (regardless of precision or
> >     vector-width)",
> >      >           "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
> >     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) +
> >     (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
> >     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 *
> >     CPU_CLK_UNHALTED.THREAD )",
> >      >           "MetricGroup": "Cor;Flops;HPC",
> >      >           "MetricName": "FP_Arith_Utilization",
> >      > -        "PublicDescription": "Actual per-core usage of the
> >     Floating Point execution units (regardless of the vector width).
> >     Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
> >      > +        "PublicDescription": "Actual per-core usage of the
> >     Floating Point non-X87 execution units (regardless of precision or
> >     vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply
> >     Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar
> >     or 128/256-bit vectors - less common)."
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Actual per-core usage of the
> >     Floating Point execution units (regardless of the vector width). SMT
> >     version; use when SMT is enabled and measuring per logical CPU.",
> >      > +        "BriefDescription": "Actual per-core usage of the
> >     Floating Point non-X87 execution units (regardless of precision or
> >     vector-width). SMT version; use when SMT is enabled and measuring
> >     per logical CPU.",
> >      >           "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
> >     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) +
> >     (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
> >     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 * ( (
> >     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) )",
> >      >           "MetricGroup": "Cor;Flops;HPC_SMT",
> >      >           "MetricName": "FP_Arith_Utilization_SMT",
> >      > -        "PublicDescription": "Actual per-core usage of the
> >     Floating Point execution units (regardless of the vector width).
> >     Values > 1 are possible due to Fused-Multiply Add (FMA) counting.
> >     SMT version; use when SMT is enabled and measuring per logical CPU."
> >      > +        "PublicDescription": "Actual per-core usage of the
> >     Floating Point non-X87 execution units (regardless of precision or
> >     vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply
> >     Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar
> >     or 128/256-bit vectors - less common). SMT version; use when SMT is
> >     enabled and measuring per logical CPU."
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Instruction-Level-Parallelism
> >     (average number of uops executed when there is at least 1 uop
> >     executed)",
> >      > +        "BriefDescription": "Instruction-Level-Parallelism
> >     (average number of uops executed when there is execution) per-core",
> >      >           "MetricExpr": "UOPS_EXECUTED.THREAD / ((
> >     cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else
> >     UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
> >      >           "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
> >      >           "MetricName": "ILP"
> >      >       },
> >      > -    {
> >      > -        "BriefDescription": "Branch Misprediction Cost: Fraction
> >     of TMA slots wasted per non-speculative branch misprediction
> >     (retired JEClear)",
> >      > -        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
> >     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *
> >     INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
> >     CPU_CLK_UNHALTED.THREAD)) * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
> >     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
> >     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
> >     CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) /
> >     BR_MISP_RETIRED.ALL_BRANCHES",
> >      > -        "MetricGroup": "Bad;BrMispredicts",
> >      > -        "MetricName": "Branch_Misprediction_Cost"
> >      > -    },
> >      > -    {
> >      > -        "BriefDescription": "Branch Misprediction Cost: Fraction
> >     of TMA slots wasted per non-speculative branch misprediction
> >     (retired JEClear)",
> >      > -        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
> >     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( (
> >     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )
> >     )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (
> >     ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
> >     * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
> >     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
> >     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( (
> >     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
> >     ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))
> >     / BR_MISP_RETIRED.ALL_BRANCHES",
> >      > -        "MetricGroup": "Bad;BrMispredicts_SMT",
> >      > -        "MetricName": "Branch_Misprediction_Cost_SMT"
> >      > -    },
> >      > -    {
> >      > -        "BriefDescription": "Number of Instructions per
> >     non-speculative Branch Misprediction (JEClear)",
> >      > -        "MetricExpr": "INST_RETIRED.ANY /
> >     BR_MISP_RETIRED.ALL_BRANCHES",
> >      > -        "MetricGroup": "Bad;BadSpec;BrMispredicts",
> >      > -        "MetricName": "IpMispredict"
> >      > -    },
> >      >       {
> >      >           "BriefDescription": "Core actual clocks when any
> >     Logical Processor is active on the Physical Core",
> >      >           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1
> >     + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
> >      > @@ -256,6 +232,18 @@
> >      >           "MetricGroup": "Summary;TmaL1",
> >      >           "MetricName": "Instructions"
> >      >       },
> >      > +    {
> >      > +        "BriefDescription": "Average number of Uops retired in
> >     cycles where at least one uop has retired.",
> >      > +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS /
> >     cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
> >      > +        "MetricGroup": "Pipeline;Ret",
> >      > +        "MetricName": "Retire"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "",
> >      > +        "MetricExpr": "UOPS_EXECUTED.THREAD /
> >     cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
> >      > +        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
> >      > +        "MetricName": "Execute"
> >      > +    },
> >      >       {
> >      >           "BriefDescription": "Fraction of Uops delivered by the
> >     DSB (aka Decoded ICache; or Uop Cache)",
> >      >           "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS +
> >     LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
> >      > @@ -263,35 +251,34 @@
> >      >           "MetricName": "DSB_Coverage"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Actual Average Latency for L1
> >     data-cache miss demand load instructions (in core cycles)",
> >      > -        "MetricExpr": "L1D_PEND_MISS.PENDING / (
> >     MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
> >      > -        "MetricGroup": "Mem;MemoryBound;MemoryLat",
> >      > -        "MetricName": "Load_Miss_Real_Latency",
> >      > -        "PublicDescription": "Actual Average Latency for L1
> >     data-cache miss demand load instructions (in core cycles). Latency
> >     may be overestimated for multi-load instructions - e.g. repeat strings."
> >      > +        "BriefDescription": "Number of Instructions per
> >     non-speculative Branch Misprediction (JEClear) (lower number means
> >     higher occurrence rate)",
> >      > +        "MetricExpr": "INST_RETIRED.ANY /
> >     BR_MISP_RETIRED.ALL_BRANCHES",
> >      > +        "MetricGroup": "Bad;BadSpec;BrMispredicts",
> >      > +        "MetricName": "IpMispredict"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Memory-Level-Parallelism (average
> >     number of L1 miss demand load when there is at least one such miss.
> >     Per-Logical Processor)",
> >      > -        "MetricExpr": "L1D_PEND_MISS.PENDING /
> >     L1D_PEND_MISS.PENDING_CYCLES",
> >      > -        "MetricGroup": "Mem;MemoryBound;MemoryBW",
> >      > -        "MetricName": "MLP"
> >      > +        "BriefDescription": "Branch Misprediction Cost: Fraction
> >     of TMA slots wasted per non-speculative branch misprediction
> >     (retired JEClear)",
> >      > +        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
> >     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *
> >     INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
> >     CPU_CLK_UNHALTED.THREAD)) * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
> >     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
> >     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
> >     CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) /
> >     BR_MISP_RETIRED.ALL_BRANCHES",
> >      > +        "MetricGroup": "Bad;BrMispredicts",
> >      > +        "MetricName": "Branch_Misprediction_Cost"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Average data fill bandwidth to the
> >     L1 data cache [GB / sec]",
> >      > -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 /
> >     duration_time",
> >      > -        "MetricGroup": "Mem;MemoryBW",
> >      > -        "MetricName": "L1D_Cache_Fill_BW"
> >      > +        "BriefDescription": "Branch Misprediction Cost: Fraction
> >     of TMA slots wasted per non-speculative branch misprediction
> >     (retired JEClear)",
> >      > +        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
> >     UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( (
> >     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )
> >     )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (
> >     ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
> >     * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
> >     / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
> >     MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( (
> >     CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
> >     ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
> >     CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))
> >     / BR_MISP_RETIRED.ALL_BRANCHES",
> >      > +        "MetricGroup": "Bad;BrMispredicts_SMT",
> >      > +        "MetricName": "Branch_Misprediction_Cost_SMT"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Average data fill bandwidth to the
> >     L2 cache [GB / sec]",
> >      > -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 /
> >     duration_time",
> >      > -        "MetricGroup": "Mem;MemoryBW",
> >      > -        "MetricName": "L2_Cache_Fill_BW"
> >      > +        "BriefDescription": "Actual Average Latency for L1
> >     data-cache miss demand load operations (in core cycles)",
> >      > +        "MetricExpr": "L1D_PEND_MISS.PENDING / (
> >     MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
> >      > +        "MetricGroup": "Mem;MemoryBound;MemoryLat",
> >      > +        "MetricName": "Load_Miss_Real_Latency"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Average per-core data fill
> >     bandwidth to the L3 cache [GB / sec]",
> >      > -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000
> >     / duration_time",
> >      > -        "MetricGroup": "Mem;MemoryBW",
> >      > -        "MetricName": "L3_Cache_Fill_BW"
> >      > +        "BriefDescription": "Memory-Level-Parallelism (average
> >     number of L1 miss demand load when there is at least one such miss.
> >     Per-Logical Processor)",
> >      > +        "MetricExpr": "L1D_PEND_MISS.PENDING /
> >     L1D_PEND_MISS.PENDING_CYCLES",
> >      > +        "MetricGroup": "Mem;MemoryBound;MemoryBW",
> >      > +        "MetricName": "MLP"
> >      >       },
> >      >       {
> >      >           "BriefDescription": "L1 cache true misses per kilo
> >     instruction for retired demand loads",
> >      > @@ -306,13 +293,13 @@
> >      >           "MetricName": "L2MPKI"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "L2 cache misses per kilo
> >     instruction for all request types (including speculative)",
> >      > +        "BriefDescription": "L2 cache ([RKL+] true) misses per
> >     kilo instruction for all request types (including speculative)",
> >      >           "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
> >      >           "MetricGroup": "Mem;CacheMisses;Offcore",
> >      >           "MetricName": "L2MPKI_All"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "L2 cache misses per kilo
> >     instruction for all demand loads  (including speculative)",
> >      > +        "BriefDescription": "L2 cache ([RKL+] true) misses per
> >     kilo instruction for all demand loads  (including speculative)",
> >      >           "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS /
> >     INST_RETIRED.ANY",
> >      >           "MetricGroup": "Mem;CacheMisses",
> >      >           "MetricName": "L2MPKI_Load"
> >      > @@ -348,6 +335,48 @@
> >      >           "MetricGroup": "Mem;MemoryTLB_SMT",
> >      >           "MetricName": "Page_Walks_Utilization_SMT"
> >      >       },
> >      > +    {
> >      > +        "BriefDescription": "Average per-core data fill
> >     bandwidth to the L1 data cache [GB / sec]",
> >      > +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 /
> >     duration_time",
> >      > +        "MetricGroup": "Mem;MemoryBW",
> >      > +        "MetricName": "L1D_Cache_Fill_BW"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average per-core data fill
> >     bandwidth to the L2 cache [GB / sec]",
> >      > +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 /
> >     duration_time",
> >      > +        "MetricGroup": "Mem;MemoryBW",
> >      > +        "MetricName": "L2_Cache_Fill_BW"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average per-core data fill
> >     bandwidth to the L3 cache [GB / sec]",
> >      > +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000
> >     / duration_time",
> >      > +        "MetricGroup": "Mem;MemoryBW",
> >      > +        "MetricName": "L3_Cache_Fill_BW"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average per-thread data fill
> >     bandwidth to the L1 data cache [GB / sec]",
> >      > +        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 /
> >     duration_time)",
> >      > +        "MetricGroup": "Mem;MemoryBW",
> >      > +        "MetricName": "L1D_Cache_Fill_BW_1T"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average per-thread data fill
> >     bandwidth to the L2 cache [GB / sec]",
> >      > +        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 /
> >     duration_time)",
> >      > +        "MetricGroup": "Mem;MemoryBW",
> >      > +        "MetricName": "L2_Cache_Fill_BW_1T"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average per-thread data fill
> >     bandwidth to the L3 cache [GB / sec]",
> >      > +        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000
> >     / duration_time)",
> >      > +        "MetricGroup": "Mem;MemoryBW",
> >      > +        "MetricName": "L3_Cache_Fill_BW_1T"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average per-thread data access
> >     bandwidth to the L3 cache [GB / sec]",
> >      > +        "MetricExpr": "0",
> >      > +        "MetricGroup": "Mem;MemoryBW;Offcore",
> >      > +        "MetricName": "L3_Cache_Access_BW_1T"
> >      > +    },
> >      >       {
> >      >           "BriefDescription": "Average CPU Utilization",
> >      >           "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
> >      > @@ -364,7 +393,8 @@
> >      >           "BriefDescription": "Giga Floating Point Operations Per
> >     Second",
> >      >           "MetricExpr": "( ( 1 * (
> >     FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
> >     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 *
> >     FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * (
> >     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 *
> >     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / 1000000000 ) /
> >     duration_time",
> >      >           "MetricGroup": "Cor;Flops;HPC",
> >      > -        "MetricName": "GFLOPs"
> >      > +        "MetricName": "GFLOPs",
> >      > +        "PublicDescription": "Giga Floating Point Operations Per
> >     Second. Aggregate across all supported options of: FP precisions,
> >     scalar and vector instructions, vector-width and AMX engine."
> >      >       },
> >      >       {
> >      >           "BriefDescription": "Average Frequency Utilization
> >     relative nominal frequency",
> >      > @@ -461,5 +491,439 @@
> >      >           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@)
> >     * 100",
> >      >           "MetricGroup": "Power",
> >      >           "MetricName": "C7_Pkg_Residency"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "CPU operating frequency (in GHz)",
> >      > +        "MetricExpr": "( CPU_CLK_UNHALTED.THREAD /
> >     CPU_CLK_UNHALTED.REF_TSC * #SYSTEM_TSC_FREQ ) / 1000000000",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "cpu_operating_frequency",
> >      > +        "ScaleUnit": "1GHz"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Cycles per instruction retired;
> >     indicating how much time each executed instruction took; in units of
> >     cycles.",
> >      > +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "cpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "The ratio of number of completed
> >     memory load instructions to the total number completed instructions",
> >      > +        "MetricExpr": "MEM_UOPS_RETIRED.ALL_LOADS /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "loads_per_instr",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "The ratio of number of completed
> >     memory store instructions to the total number completed instructions",
> >      > +        "MetricExpr": "MEM_UOPS_RETIRED.ALL_STORES /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "stores_per_instr",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of requests missing
> >     L1 data cache (includes data+rfo w/ prefetches) to the total number
> >     of completed instructions",
> >      > +        "MetricExpr": "L1D.REPLACEMENT / INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "l1d_mpi_includes_data_plus_rfo_with_prefetches",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of demand load
> >     requests hitting in L1 data cache to the total number of completed
> >     instructions",
> >      > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L1_HIT /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "l1d_demand_data_read_hits_per_instr",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of code read
> >     requests missing in L1 instruction cache (includes prefetches) to
> >     the total number of completed instructions",
> >      > +        "MetricExpr": "L2_RQSTS.ALL_CODE_RD / INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "l1_i_code_read_misses_with_prefetches_per_instr",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of completed demand
> >     load requests hitting in L2 cache to the total number of completed
> >     instructions",
> >      > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L2_HIT /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "l2_demand_data_read_hits_per_instr",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of requests missing
> >     L2 cache (includes code+data+rfo w/ prefetches) to the total number
> >     of completed instructions",
> >      > +        "MetricExpr": "L2_LINES_IN.ALL / INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "l2_mpi_includes_code_plus_data_plus_rfo_with_prefetches",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of completed data
> >     read request missing L2 cache to the total number of completed
> >     instructions",
> >      > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L2_MISS /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "l2_demand_data_read_mpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of code read
> >     request missing L2 cache to the total number of completed instructions",
> >      > +        "MetricExpr": "L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "l2_demand_code_mpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of data read
> >     requests missing last level core cache (includes demand w/
> >     prefetches) to the total number of completed instructions",
> >      > +        "MetricExpr": "(
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x192@ ) /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "llc_data_read_mpi_demand_plus_prefetch",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of code read
> >     requests missing last level core cache (includes demand w/
> >     prefetches) to the total number of completed instructions",
> >      > +        "MetricExpr": "(
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x181@ +
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x191@ ) /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "llc_code_read_mpi_demand_plus_prefetch",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average latency of a last level
> >     cache (LLC) demand and prefetch data read miss (read memory access)
> >     in nano seconds",
> >      > +        "MetricExpr": "( 1000000000 * (
> >     cbox@UNC_C_TOR_OCCUPANCY.MISS_OPCODE\\,filter_opc\\=0x182@ /
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
> >     UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
> >     ) ) ) * duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "llc_data_read_demand_plus_prefetch_miss_latency",
> >      > +        "ScaleUnit": "1ns"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average latency of a last level
> >     cache (LLC) demand and prefetch data read miss (read memory access)
> >     addressed to local memory in nano seconds",
> >      > +        "MetricExpr": "( 1000000000 * (
> >     cbox@UNC_C_TOR_OCCUPANCY.MISS_LOCAL_OPCODE\\,filter_opc\\=0x182@ /
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
> >     UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
> >     ) ) ) * duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "llc_data_read_demand_plus_prefetch_miss_latency_for_local_requests",
> >      > +        "ScaleUnit": "1ns"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Average latency of a last level
> >     cache (LLC) demand and prefetch data read miss (read memory access)
> >     addressed to remote memory in nano seconds",
> >      > +        "MetricExpr": "( 1000000000 * (
> >     cbox@UNC_C_TOR_OCCUPANCY.MISS_REMOTE_OPCODE\\,filter_opc\\=0x182@ /
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
> >     UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
> >     ) ) ) * duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests",
> >      > +        "ScaleUnit": "1ns"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of completed page
> >     walks (for all page sizes) caused by a code fetch to the total
> >     number of completed instructions. This implies it missed in the ITLB
> >     (Instruction TLB) and further levels of TLB.",
> >      > +        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "itlb_mpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of completed page
> >     walks (for 2 megabyte and 4 megabyte page sizes) caused by a code
> >     fetch to the total number of completed instructions. This implies it
> >     missed in the Instruction Translation Lookaside Buffer (ITLB) and
> >     further levels of TLB.",
> >      > +        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED_2M_4M /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "itlb_large_page_mpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of completed page
> >     walks (for all page sizes) caused by demand data loads to the total
> >     number of completed instructions. This implies it missed in the DTLB
> >     and further levels of TLB.",
> >      > +        "MetricExpr": "DTLB_LOAD_MISSES.WALK_COMPLETED /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "dtlb_load_mpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Ratio of number of completed page
> >     walks (for all page sizes) caused by demand data stores to the total
> >     number of completed instructions. This implies it missed in the DTLB
> >     and further levels of TLB.",
> >      > +        "MetricExpr": "DTLB_STORE_MISSES.WALK_COMPLETED /
> >     INST_RETIRED.ANY",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "dtlb_store_mpi",
> >      > +        "ScaleUnit": "1per_instr"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Memory read that miss the last
> >     level cache (LLC) addressed to local DRAM as a percentage of total
> >     memory read accesses, does not include LLC prefetches.",
> >      > +        "MetricExpr": "100 *
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ / (
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ )",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "numa_percent_reads_addressed_to_local_dram",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Memory reads that miss the last
> >     level cache (LLC) addressed to remote DRAM as a percentage of total
> >     memory read accesses, does not include LLC prefetches.",
> >      > +        "MetricExpr": "100 *
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ / (
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
> >     cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ )",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "numa_percent_reads_addressed_to_remote_dram",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Uncore operating frequency in GHz",
> >      > +        "MetricExpr": "UNC_C_CLOCKTICKS / (
> >     source_count(UNC_C_CLOCKTICKS) * #num_packages ) / 1000000000",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "uncore_frequency",
> >      > +        "ScaleUnit": "1GHz"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Intel(R) Quick Path Interconnect
> >     (QPI) data transmit bandwidth (MB/sec)",
> >      > +        "MetricExpr": "( UNC_Q_TxL_FLITS_G0.DATA * 8 / 1000000)
> >     / duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "qpi_data_transmit_bw_only_data",
> >      > +        "ScaleUnit": "1MB/s"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "DDR memory read bandwidth (MB/sec)",
> >      > +        "MetricExpr": "( UNC_M_CAS_COUNT.RD * 64 / 1000000) /
> >     duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "memory_bandwidth_read",
> >      > +        "ScaleUnit": "1MB/s"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "DDR memory write bandwidth (MB/sec)",
> >      > +        "MetricExpr": "( UNC_M_CAS_COUNT.WR * 64 / 1000000) /
> >     duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "memory_bandwidth_write",
> >      > +        "ScaleUnit": "1MB/s"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "DDR memory bandwidth (MB/sec)",
> >      > +        "MetricExpr": "(( UNC_M_CAS_COUNT.RD +
> >     UNC_M_CAS_COUNT.WR ) * 64 / 1000000) / duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "memory_bandwidth_total",
> >      > +        "ScaleUnit": "1MB/s"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Bandwidth of IO reads that are
> >     initiated by end device controllers that are requesting memory from
> >     the CPU.",
> >      > +        "MetricExpr": "(
> >     cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x19e@ * 64 / 1000000)
> >     / duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "io_bandwidth_read",
> >      > +        "ScaleUnit": "1MB/s"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Bandwidth of IO writes that are
> >     initiated by end device controllers that are writing memory to the
> >     CPU.",
> >      > +        "MetricExpr": "((
> >     cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x1c8\\,filter_tid\\=0x3e@
> >     +
> >     cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x180\\,filter_tid\\=0x3e@
> >     ) * 64 / 1000000) / duration_time",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName": "io_bandwidth_write",
> >      > +        "ScaleUnit": "1MB/s"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Uops delivered from decoded
> >     instruction cache (decoded stream buffer or DSB) as a percent of
> >     total uops delivered to Instruction Decode Queue",
> >      > +        "MetricExpr": "100 * ( IDQ.DSB_UOPS / UOPS_ISSUED.ANY )",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "percent_uops_delivered_from_decoded_icache_dsb",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Uops delivered from legacy decode
> >     pipeline (Micro-instruction Translation Engine or MITE) as a percent
> >     of total uops delivered to Instruction Decode Queue",
> >      > +        "MetricExpr": "100 * ( IDQ.MITE_UOPS / UOPS_ISSUED.ANY )",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "percent_uops_delivered_from_legacy_decode_pipeline_mite",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Uops delivered from microcode
> >     sequencer (MS) as a percent of total uops delivered to Instruction
> >     Decode Queue",
> >      > +        "MetricExpr": "100 * ( IDQ.MS_UOPS / UOPS_ISSUED.ANY )",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "percent_uops_delivered_from_microcode_sequencer_ms",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "Uops delivered from loop stream
> >     detector(LSD) as a percent of total uops delivered to Instruction
> >     Decode Queue",
> >      > +        "MetricExpr": "100 * ( LSD.UOPS / UOPS_ISSUED.ANY )",
> >      > +        "MetricGroup": "",
> >      > +        "MetricName":
> >     "percent_uops_delivered_from_loop_stream_detector_lsd",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This category represents fraction
> >     of slots where the processor's Frontend undersupplies its Backend.
> >     Frontend denotes the first part of the processor core responsible to
> >     fetch operations that are executed later on by the Backend part.
> >     Within the Frontend; a branch predictor predicts the next address to
> >     fetch; cache-lines are fetched from the memory subsystem; parsed
> >     into instructions; and lastly decoded into micro-operations (uops).
> >     Ideally the Frontend can issue Machine_Width uops every cycle to the
> >     Backend. Frontend Bound denotes unutilized issue-slots when there is
> >     no Backend stall; i.e. bubbles where Frontend delivered no uops
> >     while Backend could have accepted them. For example; stalls due to
> >     instruction-cache misses would be categorized under Frontend Bound.",
> >      > +        "MetricExpr": "100 * ( IDQ_UOPS_NOT_DELIVERED.CORE / ( (
> >     4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) )",
> >      > +        "MetricGroup": "TmaL1;PGO",
> >      > +        "MetricName": "tma_frontend_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots the CPU was stalled due to Frontend latency issues.  For
> >     example; instruction-cache misses; iTLB misses or fetch stalls after
> >     a branch misprediction are categorized under Frontend Latency. In
> >     such cases; the Frontend eventually delivers no uops for some period.",
> >      > +        "MetricExpr": "100 * ( ( 4 ) *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) )",
> >      > +        "MetricGroup":
> >     "Frontend;TmaL2;m_tma_frontend_bound_percent",
> >      > +        "MetricName": "tma_fetch_latency_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     cycles the CPU was stalled due to instruction cache misses.",
> >      > +        "MetricExpr": "100 * ( ICACHE.IFDATA_STALL / (
> >     CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "BigFoot;FetchLat;IcMiss;TmaL3;m_tma_fetch_latency_percent",
> >      > +        "MetricName": "tma_icache_misses_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     cycles the CPU was stalled due to Instruction TLB (ITLB) misses.",
> >      > +        "MetricExpr": "100 * ( ( 14 * ITLB_MISSES.STLB_HIT +
> >     cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=0x1@ + 7 *
> >     ITLB_MISSES.WALK_COMPLETED ) / ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "BigFoot;FetchLat;MemoryTLB;TmaL3;m_tma_fetch_latency_percent",
> >      > +        "MetricName": "tma_itlb_misses_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     cycles the CPU was stalled due to Branch Resteers. Branch Resteers
> >     estimates the Frontend delay in fetching operations from corrected
> >     path; following all sorts of miss-predicted branches. For example;
> >     branchy code with lots of miss-predictions might get categorized
> >     under Branch Resteers. Note the value of this node may overlap with
> >     its siblings.",
> >      > +        "MetricExpr": "100 * ( ( 12 ) * (
> >     BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
> >     / ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
> >      > +        "MetricName": "tma_branch_resteers_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     cycles the CPU was stalled due to switches from DSB to MITE
> >     pipelines. The DSB (decoded i-cache) is a Uop Cache where the
> >     front-end directly delivers Uops (micro operations) avoiding heavy
> >     x86 decoding. The DSB pipeline has shorter latency and delivered
> >     higher bandwidth than the MITE (legacy instruction decode pipeline).
> >     Switching between the two pipelines can cause penalties hence this
> >     metric measures the exposed penalty.",
> >      > +        "MetricExpr": "100 * ( DSB2MITE_SWITCHES.PENALTY_CYCLES
> >     / ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "DSBmiss;FetchLat;TmaL3;m_tma_fetch_latency_percent",
> >      > +        "MetricName": "tma_dsb_switches_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using
> >     proper compiler flags or Intel Compiler by default will certainly
> >     avoid this. #Link: Optimization Guide about LCP BKMs.",
> >      > +        "MetricExpr": "100 * ( ILD_STALL.LCP / (
> >     CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
> >      > +        "MetricName": "tma_lcp_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates the fraction
> >     of cycles when the CPU was stalled due to switches of uop delivery
> >     to the Microcode Sequencer (MS). Commonly used instructions are
> >     optimized for delivery by the DSB (decoded i-cache) or MITE (legacy
> >     instruction decode) pipelines. Certain operations cannot be handled
> >     natively by the execution pipeline; and must be performed by
> >     microcode (small programs injected into the execution stream).
> >     Switching to the MS too often can negatively impact performance. The
> >     MS is designated to deliver long uop flows required by CISC
> >     instructions like CPUID; or uncommon conditions like Floating Point
> >     Assists when dealing with Denormals.",
> >      > +        "MetricExpr": "100 * ( ( 2 ) * IDQ.MS_SWITCHES / (
> >     CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "FetchLat;MicroSeq;TmaL3;m_tma_fetch_latency_percent",
> >      > +        "MetricName": "tma_ms_switches_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots the CPU was stalled due to Frontend bandwidth issues.  For
> >     example; inefficiencies at the instruction decoders; or restrictions
> >     for caching in the DSB (decoded uops cache) are categorized under
> >     Fetch Bandwidth. In such cases; the Frontend typically delivers
> >     suboptimal amount of uops to the Backend.",
> >      > +        "MetricExpr": "100 * ( ( IDQ_UOPS_NOT_DELIVERED.CORE / (
> >     ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( 4 ) *
> >     IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
> >      > +        "MetricGroup":
> >     "FetchBW;Frontend;TmaL2;m_tma_frontend_bound_percent",
> >      > +        "MetricName": "tma_fetch_bandwidth_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents Core
> >     fraction of cycles in which CPU was likely limited due to the MITE
> >     pipeline (the legacy decode pipeline). This pipeline is used for
> >     code that was not pre-cached in the DSB or LSD. For example;
> >     inefficiencies due to asymmetric decoders; use of long immediate or
> >     LCP can manifest as MITE fetch bandwidth bottleneck.",
> >      > +        "MetricExpr": "100 * ( ( IDQ.ALL_MITE_CYCLES_ANY_UOPS -
> >     IDQ.ALL_MITE_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 )
> >     if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) / 2 )",
> >      > +        "MetricGroup":
> >     "DSBmiss;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
> >      > +        "MetricName": "tma_mite_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents Core
> >     fraction of cycles in which CPU was likely limited due to DSB
> >     (decoded uop cache) fetch pipeline.  For example; inefficient
> >     utilization of the DSB cache structure or bank conflict when reading
> >     from it; are categorized here.",
> >      > +        "MetricExpr": "100 * ( ( IDQ.ALL_DSB_CYCLES_ANY_UOPS -
> >     IDQ.ALL_DSB_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 )
> >     if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) / 2 )",
> >      > +        "MetricGroup":
> >     "DSB;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
> >      > +        "MetricName": "tma_dsb_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This category represents fraction
> >     of slots wasted due to incorrect speculations. This include slots
> >     used to issue uops that do not eventually get retired and slots for
> >     which the issue-pipeline was blocked due to recovery from earlier
> >     incorrect speculation. For example; wasted work due to
> >     miss-predicted branches are categorized under Bad Speculation
> >     category. Incorrect data speculation followed by Memory Ordering
> >     Nukes is another example.",
> >      > +        "MetricExpr": "100 * ( ( UOPS_ISSUED.ANY - (
> >     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) )",
> >      > +        "MetricGroup": "TmaL1",
> >      > +        "MetricName": "tma_bad_speculation_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots the CPU has wasted due to Branch Misprediction.  These slots
> >     are either wasted by uops fetched from an incorrectly speculated
> >     program path; or stalls when the out-of-order part of the machine
> >     needs to recover its state from a speculative path.",
> >      > +        "MetricExpr": "100 * ( ( BR_MISP_RETIRED.ALL_BRANCHES /
> >     ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( (
> >     UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
> >      > +        "MetricGroup":
> >     "BadSpec;BrMispredicts;TmaL2;m_tma_bad_speculation_percent",
> >      > +        "MetricName": "tma_branch_mispredicts_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots the CPU has wasted due to Machine Clears.  These slots are
> >     either wasted by uops fetched prior to the clear; or stalls the
> >     out-of-order portion of the machine needs to recover its state after
> >     the clear. For example; this can happen due to memory ordering Nukes
> >     (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes.",
> >      > +        "MetricExpr": "100 * ( ( ( UOPS_ISSUED.ANY - (
> >     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( BR_MISP_RETIRED.ALL_BRANCHES /
> >     ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( (
> >     UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
> >      > +        "MetricGroup":
> >     "BadSpec;MachineClears;TmaL2;m_tma_bad_speculation_percent",
> >      > +        "MetricName": "tma_machine_clears_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This category represents fraction
> >     of slots where no uops are being delivered due to a lack of required
> >     resources for accepting new uops in the Backend. Backend is the
> >     portion of the processor core where the out-of-order scheduler
> >     dispatches ready uops into their respective execution units; and
> >     once completed these uops get retired according to program order.
> >     For example; stalls due to data-cache misses or stalls due to the
> >     divider unit being overloaded are both categorized under Backend
> >     Bound. Backend Bound is further divided into two main categories:
> >     Memory Bound and Core Bound.",
> >      > +        "MetricExpr": "100 * ( 1 - ( (
> >     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
> >     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
> >     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
> >      > +        "MetricGroup": "TmaL1",
> >      > +        "MetricName": "tma_backend_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots the Memory subsystem within the Backend was a bottleneck.
> >     Memory Bound estimates fraction of slots where pipeline is likely
> >     stalled due to demand load or store instructions. This accounts
> >     mainly for (1) non-completed in-flight memory demand loads which
> >     coincides with execution units starvation; in addition to (2) cases
> >     where stores could impose backpressure on the pipeline when many of
> >     them get buffered at the same time (less common out of the two).",
> >      > +        "MetricExpr": "100 * ( ( ( CYCLE_ACTIVITY.STALLS_MEM_ANY
> >     + RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) / ( (
> >     CYCLE_ACTIVITY.STALLS_TOTAL + UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
> >     UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
> >     CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
> >     UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
> >     ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
> >     ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
> >     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) ) * ( 1 - ( (
> >     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
> >     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
> >     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) )",
> >      > +        "MetricGroup": "Backend;TmaL2;m_tma_backend_bound_percent",
> >      > +        "MetricName": "tma_memory_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates how often the
> >     CPU was stalled without loads missing the L1 data cache.  The L1
> >     data cache typically has the shortest latency.  However; in certain
> >     cases like loads blocked on older stores; a load might suffer due to
> >     high latency even though it is being satisfied by the L1. Another
> >     example is loads who miss in the TLB. These cases are characterized
> >     by execution unit stalls; while some non-completed demand load lives
> >     in the machine without having that demand load missing the L1 cache.",
> >      > +        "MetricExpr": "100 * ( max( (
> >     CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / (
> >     CPU_CLK_UNHALTED.THREAD ) , 0 ) )",
> >      > +        "MetricGroup":
> >     "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
> >      > +        "MetricName": "tma_l1_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates how often the
> >     CPU was stalled due to L2 cache accesses by loads.  Avoiding cache
> >     misses (i.e. L1 misses/L2 hits) can improve the latency and increase
> >     performance.",
> >      > +        "MetricExpr": "100 * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS
> >     - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
> >      > +        "MetricName": "tma_l2_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates how often the
> >     CPU was stalled due to loads accesses to L3 cache or contended with
> >     a sibling Core.  Avoiding cache misses (i.e. L2 misses/L3 hits) can
> >     improve the latency and increase performance.",
> >      > +        "MetricExpr": "100 * ( ( MEM_LOAD_UOPS_RETIRED.L3_HIT /
> >     ( MEM_LOAD_UOPS_RETIRED.L3_HIT + ( 7 ) *
> >     MEM_LOAD_UOPS_RETIRED.L3_MISS ) ) * CYCLE_ACTIVITY.STALLS_L2_MISS /
> >     ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
> >      > +        "MetricName": "tma_l3_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates how often the
> >     CPU was stalled on accesses to external memory (DRAM) by loads.
> >     Better caching can improve the latency and increase performance.",
> >      > +        "MetricExpr": "100 * ( min( ( ( 1 - (
> >     MEM_LOAD_UOPS_RETIRED.L3_HIT / ( MEM_LOAD_UOPS_RETIRED.L3_HIT + ( 7
> >     ) * MEM_LOAD_UOPS_RETIRED.L3_MISS ) ) ) *
> >     CYCLE_ACTIVITY.STALLS_L2_MISS / ( CPU_CLK_UNHALTED.THREAD ) ) , ( 1
> >     ) ) )",
> >      > +        "MetricGroup":
> >     "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
> >      > +        "MetricName": "tma_dram_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates how often CPU
> >     was stalled  due to RFO store memory accesses; RFO store issue a
> >     read-for-ownership request before the write. Even though store
> >     accesses do not typically stall out-of-order CPUs; there are few
> >     cases where stores can lead to actual stalls. This metric will be
> >     flagged should RFO stores be a bottleneck.",
> >      > +        "MetricExpr": "100 * ( RESOURCE_STALLS.SB
> >     <http://RESOURCE_STALLS.SB> / ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup":
> >     "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
> >      > +        "MetricName": "tma_store_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots where Core non-memory issues were of a bottleneck.  Shortage
> >     in hardware compute resources; or dependencies in software's
> >     instructions are both categorized under Core Bound. Hence it may
> >     indicate the machine ran out of an out-of-order resource; certain
> >     execution units are overloaded or dependencies in program's data- or
> >     instruction-flow are limiting the performance (e.g. FP-chained
> >     long-latency arithmetic operations).",
> >      > +        "MetricExpr": "100 * ( ( 1 - ( (
> >     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
> >     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
> >     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) - ( ( (
> >     CYCLE_ACTIVITY.STALLS_MEM_ANY + RESOURCE_STALLS.SB
> >     <http://RESOURCE_STALLS.SB> ) / ( ( CYCLE_ACTIVITY.STALLS_TOTAL +
> >     UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
> >     UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
> >     CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
> >     UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
> >     ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
> >     ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
> >     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) ) * ( 1 - ( (
> >     IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
> >     UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
> >     INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
> >     INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
> >     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) ) )",
> >      > +        "MetricGroup":
> >     "Backend;TmaL2;Compute;m_tma_backend_bound_percent",
> >      > +        "MetricName": "tma_core_bound_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     cycles where the Divider unit was active. Divide and square root
> >     instructions are performed by the Divider unit and can take
> >     considerably longer latency than integer or Floating Point addition;
> >     subtraction; or multiplication.",
> >      > +        "MetricExpr": "100 * ( ARITH.FPU_DIV_ACTIVE / ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) )",
> >      > +        "MetricGroup": "TmaL3;m_tma_core_bound_percent",
> >      > +        "MetricName": "tma_divider_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric estimates fraction of
> >     cycles the CPU performance was potentially limited due to Core
> >     computation issues (non divider-related).  Two distinct categories
> >     can be attributed into this metric: (1) heavy data-dependency among
> >     contiguous instructions would manifest in this metric - such cases
> >     are often referred to as low Instruction Level Parallelism (ILP).
> >     (2) Contention on some hardware execution unit other than Divider.
> >     For example; when there are too many multiply operations.",
> >      > +        "MetricExpr": "100 * ( ( ( ( CYCLE_ACTIVITY.STALLS_TOTAL
> >     + UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
> >     UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
> >     CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
> >     UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
> >     ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
> >     ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
> >     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) -
> >     RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> -
> >     CYCLE_ACTIVITY.STALLS_MEM_ANY ) / ( CPU_CLK_UNHALTED.THREAD ) )",
> >      > +        "MetricGroup": "PortsUtil;TmaL3;m_tma_core_bound_percent",
> >      > +        "MetricName": "tma_ports_utilization_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This category represents fraction
> >     of slots utilized by useful work i.e. issued uops that eventually
> >     get retired. Ideally; all pipeline slots would be attributed to the
> >     Retiring category.  Retiring of 100% would indicate the maximum
> >     Pipeline_Width throughput was achieved.  Maximizing Retiring
> >     typically increases the Instructions-per-cycle (see IPC metric).
> >     Note that a high Retiring value does not necessary mean there is no
> >     room for more performance.  For example; Heavy-operations or
> >     Microcode Assists are categorized under Retiring. They often
> >     indicate suboptimal performance and can often be optimized or
> >     avoided. ",
> >      > +        "MetricExpr": "100 * ( ( UOPS_RETIRED.RETIRE_SLOTS ) / (
> >     ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) )",
> >      > +        "MetricGroup": "TmaL1",
> >      > +        "MetricName": "tma_retiring_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots where the CPU was retiring light-weight operations --
> >     instructions that require no more than one uop (micro-operation).
> >     This correlates with total number of instructions used by the
> >     program. A uops-per-instruction (see UPI metric) ratio of 1 or less
> >     should be expected for decently optimized software running on Intel
> >     Core/Xeon products. While this often indicates efficient X86
> >     instructions were executed; high value does not necessarily mean
> >     better performance cannot be achieved.",
> >      > +        "MetricExpr": "100 * ( ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
> >     ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( ( ( UOPS_RETIRED.RETIRE_SLOTS
> >     ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
> >      > +        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
> >      > +        "MetricName": "tma_light_operations_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents overall
> >     arithmetic floating-point (FP) operations fraction the CPU has
> >     executed (retired). Note this metric's value may exceed its parent
> >     due to use of \"Uops\" CountDomain and FMA double-counting.",
> >      > +        "MetricExpr": "100 * ( ( INST_RETIRED.X87 * ( (
> >     UOPS_RETIRED.RETIRE_SLOTS ) / INST_RETIRED.ANY ) / (
> >     UOPS_RETIRED.RETIRE_SLOTS ) ) + ( (
> >     FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
> >     FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) / ( UOPS_RETIRED.RETIRE_SLOTS
> >     ) ) + ( min( ( ( FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
> >     FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
> >     FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / (
> >     UOPS_RETIRED.RETIRE_SLOTS ) ) , ( 1 ) ) ) )",
> >      > +        "MetricGroup": "HPC;TmaL3;m_tma_light_operations_percent",
> >      > +        "MetricName": "tma_fp_arith_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots where the CPU was retiring heavy-weight operations --
> >     instructions that require two or more uops or microcoded sequences.
> >     This highly-correlates with the uop length of these
> >     instructions/sequences.",
> >      > +        "MetricExpr": "100 * ( ( ( ( UOPS_RETIRED.RETIRE_SLOTS )
> >     / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
> >      > +        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
> >      > +        "MetricName": "tma_heavy_operations_percent",
> >      > +        "ScaleUnit": "1%"
> >      > +    },
> >      > +    {
> >      > +        "BriefDescription": "This metric represents fraction of
> >     slots the CPU was retiring uops fetched by the Microcode Sequencer
> >     (MS) unit.  The MS is used for CISC instructions not supported by
> >     the default decoders (like repeat move strings; or CPUID); or by
> >     microcode assists used to address some operation modes (like in
> >     Floating Point assists). These cases can often be avoided.",
> >      > +        "MetricExpr": "100 * ( ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
> >     UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
> >     CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
> >     CPU_CLK_UNHALTED.THREAD ) ) ) )",
> >      > +        "MetricGroup":
> >     "MicroSeq;TmaL3;m_tma_heavy_operations_percent",
> >      > +        "MetricName": "tma_microcode_sequencer_percent",
> >      > +        "ScaleUnit": "1%"
> >      >       }
> >      >   ]
> >      > diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
> >     b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
> >      > index 127abe08362f..2efc4c0ee740 100644
> >      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
> >      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
> >      > @@ -814,9 +814,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_CODE_RD.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x04003C0244",
> >      > +        "MSRValue": "0x4003C0244",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch code
> >     reads hit in the L3 and the snoops to sibling cores hit in either
> >     E/S state and the line is not forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -829,7 +828,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x10003C0091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads hit in the L3 and the snoop to one of the sibling cores hits
> >     the line in M state and the line is forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -840,9 +838,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x04003C0091",
> >      > +        "MSRValue": "0x4003C0091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads hit in the L3 and the snoops to sibling cores hit in either
> >     E/S state and the line is not forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -855,7 +852,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x10003C07F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) hit in the L3 and the snoop to one of the
> >     sibling cores hits the line in M state and the line is forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -866,9 +862,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_READS.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x04003C07F7",
> >      > +        "MSRValue": "0x4003C07F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) hit in the L3 and the snoops to sibling cores
> >     hit in either E/S state and the line is not forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -881,7 +876,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3F803C8FFF",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all requests hit in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -894,7 +888,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x10003C0122",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch RFOs
> >     hit in the L3 and the snoop to one of the sibling cores hits the
> >     line in M state and the line is forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -905,9 +898,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x04003C0122",
> >      > +        "MSRValue": "0x4003C0122",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch RFOs
> >     hit in the L3 and the snoops to sibling cores hit in either E/S
> >     state and the line is not forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -920,7 +912,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3F803C0002",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand data writes
> >     (RFOs) hit in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -933,7 +924,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x10003C0002",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand data writes
> >     (RFOs) hit in the L3 and the snoop to one of the sibling cores hits
> >     the line in M state and the line is forwarded",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -946,7 +936,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3F803C0200",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts prefetch (that bring data
> >     to LLC only) code reads hit in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -959,7 +948,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3F803C0100",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all prefetch (that bring
> >     data to LLC only) RFOs hit in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -973,4 +961,4 @@
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x10"
> >      >       }
> >      > -]
> >      > \ No newline at end of file
> >      > +]
> >      > diff --git
> >     a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
> >     b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
> >      > index 9ad37dddb354..93bbc8600321 100644
> >      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
> >      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
> >      > @@ -5,6 +5,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     128-bit packed double precision floating-point instructions retired;
> >     some instructions will count twice as noted below.  Each count
> >     represents 2 computation operations, one for each element.  Applies
> >     to SSE* and AVX* packed double precision floating-point
> >     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP
> >     FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they
> >     perform 2 calculations per element. The DAZ and FTZ flags in the
> >     MXCSR register need to be set when using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x4"
> >      >       },
> >      > @@ -14,6 +15,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     128-bit packed single precision floating-point instructions retired;
> >     some instructions will count twice as noted below.  Each count
> >     represents 4 computation operations, one for each element.  Applies
> >     to SSE* and AVX* packed single precision floating-point
> >     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT
> >     RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice
> >     as they perform 2 calculations per element. The DAZ and FTZ flags in
> >     the MXCSR register need to be set when using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x8"
> >      >       },
> >      > @@ -23,6 +25,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     256-bit packed double precision floating-point instructions retired;
> >     some instructions will count twice as noted below.  Each count
> >     represents 4 computation operations, one for each element.  Applies
> >     to SSE* and AVX* packed double precision floating-point
> >     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT
> >     FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform
> >     2 calculations per element. The DAZ and FTZ flags in the MXCSR
> >     register need to be set when using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x10"
> >      >       },
> >      > @@ -32,6 +35,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     256-bit packed single precision floating-point instructions retired;
> >     some instructions will count twice as noted below.  Each count
> >     represents 8 computation operations, one for each element.  Applies
> >     to SSE* and AVX* packed single precision floating-point
> >     instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT
> >     RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice
> >     as they perform 2 calculations per element. The DAZ and FTZ flags in
> >     the MXCSR register need to be set when using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x20"
> >      >       },
> >      > @@ -59,6 +63,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     scalar single precision and double precision floating-point
> >     instructions retired; some instructions will count twice as noted
> >     below.  Each count represents 1 computational operation. Applies to
> >     SSE* and AVX* scalar single precision floating-point instructions:
> >     ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB
> >     instructions count twice as they perform 2 calculations per element.
> >     The DAZ and FTZ flags in the MXCSR register need to be set when
> >     using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x3"
> >      >       },
> >      > @@ -68,6 +73,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     scalar double precision floating-point instructions retired; some
> >     instructions will count twice as noted below.  Each count represents
> >     1 computational operation. Applies to SSE* and AVX* scalar double
> >     precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT
> >     FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform
> >     2 calculations per element. The DAZ and FTZ flags in the MXCSR
> >     register need to be set when using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -77,6 +83,7 @@
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xc7",
> >      >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
> >      > +        "PublicDescription": "Number of SSE/AVX computational
> >     scalar single precision floating-point instructions retired; some
> >     instructions will count twice as noted below.  Each count represents
> >     1 computational operation. Applies to SSE* and AVX* scalar single
> >     precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT
> >     RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as
> >     they perform 2 calculations per element. The DAZ and FTZ flags in
> >     the MXCSR register need to be set when using these events.",
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x2"
> >      >       },
> >      > @@ -190,4 +197,4 @@
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x3"
> >      >       }
> >      > -]
> >      > \ No newline at end of file
> >      > +]
> >      > diff --git
> >     a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
> >     b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
> >      > index f0bcb945ff76..37ce8034b2ed 100644
> >      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
> >      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
> >      > @@ -292,4 +292,4 @@
> >      >           "SampleAfterValue": "2000003",
> >      >           "UMask": "0x1"
> >      >       }
> >      > -]
> >      > \ No newline at end of file
> >      > +]
> >      > diff --git
> >     a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
> >     b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
> >      > index cce993b197e3..545f61f691b9 100644
> >      > --- a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
> >      > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
> >      > @@ -247,7 +247,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3FBFC00244",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch code
> >     reads miss in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -258,9 +257,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_CODE_RD.LLC_MISS.LOCAL_DRAM",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x0604000244",
> >      > +        "MSRValue": "0x604000244",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch code
> >     reads miss the L3 and the data is returned from local dram",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -273,7 +271,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3FBFC00091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads miss in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -284,9 +281,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.LOCAL_DRAM",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x0604000091",
> >      > +        "MSRValue": "0x604000091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads miss the L3 and the data is returned from local dram",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -297,9 +293,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.REMOTE_DRAM",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x063BC00091",
> >      > +        "MSRValue": "0x63BC00091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads miss the L3 and the data is returned from remote dram",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -312,7 +307,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x103FC00091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads miss the L3 and the modified data is transferred from remote
> >     cache",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -323,9 +317,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.REMOTE_HIT_FORWARD",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x087FC00091",
> >      > +        "MSRValue": "0x87FC00091",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch data
> >     reads miss the L3 and clean or shared data is transferred from
> >     remote cache",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -338,20 +331,18 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3FBFC007F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) miss in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      >       {
> >      > -        "BriefDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch)miss the L3 and the data is returned from local
> >     dram",
> >      > +        "BriefDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) miss the L3 and the data is returned from local
> >     dram",
> >      >           "Counter": "0,1,2,3",
> >      >           "CounterHTOff": "0,1,2,3",
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.LOCAL_DRAM",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x06040007F7",
> >      > +        "MSRValue": "0x6040007F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch)miss the L3 and the data is returned from local
> >     dram",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -362,9 +353,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_DRAM",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x063BC007F7",
> >      > +        "MSRValue": "0x63BC007F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) miss the L3 and the data is returned from remote
> >     dram",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -377,7 +367,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x103FC007F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) miss the L3 and the modified data is transferred
> >     from remote cache",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -388,9 +377,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_HIT_FORWARD",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x087FC007F7",
> >      > +        "MSRValue": "0x87FC007F7",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all data/code/rfo reads
> >     (demand & prefetch) miss the L3 and clean or shared data is
> >     transferred from remote cache",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -403,7 +391,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3FBFC08FFF",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all requests miss in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -416,7 +403,6 @@
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      >           "MSRValue": "0x3FBFC00122",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & prefetch RFOs
> >     miss in the L3",
> >      >           "SampleAfterValue": "100003",
> >      >           "UMask": "0x1"
> >      >       },
> >      > @@ -427,9 +413,8 @@
> >      >           "EventCode": "0xB7, 0xBB",
> >      >           "EventName":
> >     "OFFCORE_RESPONSE.ALL_RFO.LLC_MISS.LOCAL_DRAM",
> >      >           "MSRIndex": "0x1a6,0x1a7",
> >      > -        "MSRValue": "0x0604000122",
> >      > +        "MSRValue": "0x604000122",
> >      >           "Offcore": "1",
> >      > -        "PublicDescription": "Counts all demand & pref
> >
>
> --
> Zhengjun Xing

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 02/31] perf vendor events: Update Intel broadwellx
  2022-07-26  4:49         ` Ian Rogers
@ 2022-07-26  5:19           ` Xing Zhengjun
  0 siblings, 0 replies; 25+ messages in thread
From: Xing Zhengjun @ 2022-07-26  5:19 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Taylor, Perry, Biggers, Caleb,
	Bopardikar, Kshipra, Kan Liang, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Maxime Coquelin, Alexandre Torgue, Andi Kleen, James Clark,
	John Garry, LKML, linux-perf-users, Sedat Dilek, Stephane Eranian



On 7/26/2022 12:49 PM, Ian Rogers wrote:
> On Mon, Jul 25, 2022 at 6:25 PM Xing Zhengjun
> <zhengjun.xing@linux.intel.com> wrote:
>>
>>
>> HI Arnaldo,
>>
>> On 7/25/2022 9:06 PM, Ian Rogers wrote:
>>>
>>>
>>> On Sun, Jul 24, 2022, 6:34 PM Xing Zhengjun
>>> <zhengjun.xing@linux.intel.com <mailto:zhengjun.xing@linux.intel.com>>
>>> wrote:
>>>
>>>      Hi Ian,
>>>
>>>      On 7/23/2022 6:32 AM, Ian Rogers wrote:
>>>       > Use script at:
>>>       >
>>>      https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
>>>      <https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py>
>>>       >
>>>
>>>      It is better to add the event JSON file version and TMA version for the
>>>      future track. For example, the event list is based on broadwellx JSON
>>>      file v19, the metrics are based on TMA4.4-full.
>>>
>>>
>>> Thanks Xing,
>>>
>>> I'll add that in v2. I'd skipped it this time as I'd been adding it to
>>> the mapfile. I'll also break apart the tremontx to snowridgex change to
>>> match yours. I also will rebase to see if that'll fix the git am issue.
>>> Apologies in advance to everyone's inboxes :-)
>>>
>> Hi Arnaldo,
>>
>> Except for Snowridgex, I also post SPR/ADL/HSX/BDX event list last
>> month, Can these be merged in or do I need to update them?  Thanks.
>>
>> https://lore.kernel.org/all/20220609094222.2030167-1-zhengjun.xing@linux.intel.com/
>> https://lore.kernel.org/all/20220609094222.2030167-2-zhengjun.xing@linux.intel.com/
>>
>> https://lore.kernel.org/all/20220607092749.1976878-1-zhengjun.xing@linux.intel.com/
>> https://lore.kernel.org/all/20220607092749.1976878-2-zhengjun.xing@linux.intel.com/
>> https://lore.kernel.org/all/20220614145019.2177071-1-zhengjun.xing@linux.intel.com/
>> https://lore.kernel.org/all/20220614145019.2177071-2-zhengjun.xing@linux.intel.com/
> 
> Thanks Zhengjun,
> 
> I think those patches are all stale over what is posted here.

I think sometimes the middle version is also useful when we need back 
trace issues.

> Particular issues are:
>   - fixing the files to only contain ascii characters
>   - updating the version information in mapfile.csv
>   - for BDX and SPR (HSX isn't posted yet, although I've done some
> private testing) adding in the metrics from:
> https://github.com/intel/perfmon-metrics/tree/main/BDX/metrics/perf
> https://github.com/intel/perfmon-metrics/tree/main/SPR/metrics/perf
>   - the event data on 01.org was updated. ADL, HSX and SPR were all
> updated last Friday prior to my v1 patches.
>   - I also tested all the patches on the respective architectures,
> hence discovering the fake event parsing fix that is in patch 1.
> 
> As stated in the cover letter, the goal is to make it so that what is
> in the Linux tree exactly matches the download_and_gen.py output,
> however, there were same changes I made that are in this pull request:
> https://github.com/intel/event-converter-for-linux-perf/pull/15
> 
> Given the size of the new metrics and particularly the previously
> missed uncore data, it'd be nice to land the "compression" in:
> https://lore.kernel.org/lkml/20220715063653.3203761-1-irogers@google.com/
> which prior to this change was reducing the binary size for x86 by
> 12.5% as well as reducing over a megabytes worth of dirty pages for
> relocated data.
> 
> Thanks,
> Ian
> 
>>> Thanks,
>>> Ian
>>>
>>>
>>>       > to download and generate the latest events and metrics. Manually copy
>>>       > the broadwellx files into perf and update mapfile.csv.
>>>       >
>>>       > Tested with 'perf test':
>>>       >   10: PMU events
>>>           :
>>>       >   10.1: PMU event table sanity
>>>           : Ok
>>>       >   10.2: PMU event map aliases
>>>            : Ok
>>>       >   10.3: Parsing of PMU event table metrics
>>>           : Ok
>>>       >   10.4: Parsing of PMU event table metrics with fake PMUs
>>>            : Ok
>>>       >   90: perf all metricgroups test
>>>           : Ok
>>>       >   91: perf all metrics test
>>>            : Skip
>>>       >   93: perf all PMU test
>>>            : Ok
>>>       >
>>>       > Signed-off-by: Ian Rogers <irogers@google.com
>>>      <mailto:irogers@google.com>>
>>>       > ---
>>>       >   .../arch/x86/broadwellx/bdx-metrics.json      |  570 ++-
>>>       >   .../pmu-events/arch/x86/broadwellx/cache.json |   22 +-
>>>       >   .../arch/x86/broadwellx/floating-point.json   |    9 +-
>>>       >   .../arch/x86/broadwellx/frontend.json         |    2 +-
>>>       >   .../arch/x86/broadwellx/memory.json           |   39 +-
>>>       >   .../pmu-events/arch/x86/broadwellx/other.json |    2 +-
>>>       >   .../arch/x86/broadwellx/pipeline.json         |    4 +-
>>>       >   .../arch/x86/broadwellx/uncore-cache.json     | 3788
>>>      ++++++++++++++++-
>>>       >   .../x86/broadwellx/uncore-interconnect.json   | 1438 ++++++-
>>>       >   .../arch/x86/broadwellx/uncore-memory.json    | 2849 ++++++++++++-
>>>       >   .../arch/x86/broadwellx/uncore-other.json     | 3252 ++++++++++++++
>>>       >   .../arch/x86/broadwellx/uncore-power.json     |  437 +-
>>>       >   .../arch/x86/broadwellx/virtual-memory.json   |    2 +-
>>>       >   tools/perf/pmu-events/arch/x86/mapfile.csv    |    2 +-
>>>       >   14 files changed, 12103 insertions(+), 313 deletions(-)
>>>       >   create mode 100644
>>>      tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
>>>       >
>>>       > diff --git
>>>      a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>>>      b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>>>       > index b055947c0afe..720ee7c9332d 100644
>>>       > --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>>>       > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>>>       > @@ -74,12 +74,6 @@
>>>       >           "MetricGroup": "Branches;Fed;FetchBW",
>>>       >           "MetricName": "UpTB"
>>>       >       },
>>>       > -    {
>>>       > -        "BriefDescription": "Cycles Per Instruction (per Logical
>>>      Processor)",
>>>       > -        "MetricExpr": "1 / (INST_RETIRED.ANY /
>>>      CPU_CLK_UNHALTED.THREAD)",
>>>       > -        "MetricGroup": "Pipeline;Mem",
>>>       > -        "MetricName": "CPI"
>>>       > -    },
>>>       >       {
>>>       >           "BriefDescription": "Per-Logical Processor actual
>>>      clocks when the Logical Processor is active.",
>>>       >           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>>>       > @@ -130,43 +124,25 @@
>>>       >           "MetricName": "FLOPc_SMT"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Actual per-core usage of the
>>>      Floating Point execution units (regardless of the vector width)",
>>>       > +        "BriefDescription": "Actual per-core usage of the
>>>      Floating Point non-X87 execution units (regardless of precision or
>>>      vector-width)",
>>>       >           "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>>>      FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) +
>>>      (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
>>>      FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 *
>>>      CPU_CLK_UNHALTED.THREAD )",
>>>       >           "MetricGroup": "Cor;Flops;HPC",
>>>       >           "MetricName": "FP_Arith_Utilization",
>>>       > -        "PublicDescription": "Actual per-core usage of the
>>>      Floating Point execution units (regardless of the vector width).
>>>      Values > 1 are possible due to Fused-Multiply Add (FMA) counting."
>>>       > +        "PublicDescription": "Actual per-core usage of the
>>>      Floating Point non-X87 execution units (regardless of precision or
>>>      vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply
>>>      Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar
>>>      or 128/256-bit vectors - less common)."
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Actual per-core usage of the
>>>      Floating Point execution units (regardless of the vector width). SMT
>>>      version; use when SMT is enabled and measuring per logical CPU.",
>>>       > +        "BriefDescription": "Actual per-core usage of the
>>>      Floating Point non-X87 execution units (regardless of precision or
>>>      vector-width). SMT version; use when SMT is enabled and measuring
>>>      per logical CPU.",
>>>       >           "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>>>      FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) +
>>>      (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
>>>      FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) ) / ( 2 * ( (
>>>      CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) )",
>>>       >           "MetricGroup": "Cor;Flops;HPC_SMT",
>>>       >           "MetricName": "FP_Arith_Utilization_SMT",
>>>       > -        "PublicDescription": "Actual per-core usage of the
>>>      Floating Point execution units (regardless of the vector width).
>>>      Values > 1 are possible due to Fused-Multiply Add (FMA) counting.
>>>      SMT version; use when SMT is enabled and measuring per logical CPU."
>>>       > +        "PublicDescription": "Actual per-core usage of the
>>>      Floating Point non-X87 execution units (regardless of precision or
>>>      vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply
>>>      Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar
>>>      or 128/256-bit vectors - less common). SMT version; use when SMT is
>>>      enabled and measuring per logical CPU."
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Instruction-Level-Parallelism
>>>      (average number of uops executed when there is at least 1 uop
>>>      executed)",
>>>       > +        "BriefDescription": "Instruction-Level-Parallelism
>>>      (average number of uops executed when there is execution) per-core",
>>>       >           "MetricExpr": "UOPS_EXECUTED.THREAD / ((
>>>      cpu@UOPS_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else
>>>      UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>>>       >           "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
>>>       >           "MetricName": "ILP"
>>>       >       },
>>>       > -    {
>>>       > -        "BriefDescription": "Branch Misprediction Cost: Fraction
>>>      of TMA slots wasted per non-speculative branch misprediction
>>>      (retired JEClear)",
>>>       > -        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>>>      UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *
>>>      INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>>>      CPU_CLK_UNHALTED.THREAD)) * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>>>      / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>>>      MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>>>      CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) /
>>>      BR_MISP_RETIRED.ALL_BRANCHES",
>>>       > -        "MetricGroup": "Bad;BrMispredicts",
>>>       > -        "MetricName": "Branch_Misprediction_Cost"
>>>       > -    },
>>>       > -    {
>>>       > -        "BriefDescription": "Branch Misprediction Cost: Fraction
>>>      of TMA slots wasted per non-speculative branch misprediction
>>>      (retired JEClear)",
>>>       > -        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>>>      UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( (
>>>      CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )
>>>      )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (
>>>      ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>>>      * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>>>      / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>>>      MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( (
>>>      CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>>>      ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))
>>>      / BR_MISP_RETIRED.ALL_BRANCHES",
>>>       > -        "MetricGroup": "Bad;BrMispredicts_SMT",
>>>       > -        "MetricName": "Branch_Misprediction_Cost_SMT"
>>>       > -    },
>>>       > -    {
>>>       > -        "BriefDescription": "Number of Instructions per
>>>      non-speculative Branch Misprediction (JEClear)",
>>>       > -        "MetricExpr": "INST_RETIRED.ANY /
>>>      BR_MISP_RETIRED.ALL_BRANCHES",
>>>       > -        "MetricGroup": "Bad;BadSpec;BrMispredicts",
>>>       > -        "MetricName": "IpMispredict"
>>>       > -    },
>>>       >       {
>>>       >           "BriefDescription": "Core actual clocks when any
>>>      Logical Processor is active on the Physical Core",
>>>       >           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1
>>>      + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>>>       > @@ -256,6 +232,18 @@
>>>       >           "MetricGroup": "Summary;TmaL1",
>>>       >           "MetricName": "Instructions"
>>>       >       },
>>>       > +    {
>>>       > +        "BriefDescription": "Average number of Uops retired in
>>>      cycles where at least one uop has retired.",
>>>       > +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS /
>>>      cpu@UOPS_RETIRED.RETIRE_SLOTS\\,cmask\\=1@",
>>>       > +        "MetricGroup": "Pipeline;Ret",
>>>       > +        "MetricName": "Retire"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "",
>>>       > +        "MetricExpr": "UOPS_EXECUTED.THREAD /
>>>      cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
>>>       > +        "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
>>>       > +        "MetricName": "Execute"
>>>       > +    },
>>>       >       {
>>>       >           "BriefDescription": "Fraction of Uops delivered by the
>>>      DSB (aka Decoded ICache; or Uop Cache)",
>>>       >           "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS +
>>>      LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>>       > @@ -263,35 +251,34 @@
>>>       >           "MetricName": "DSB_Coverage"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Actual Average Latency for L1
>>>      data-cache miss demand load instructions (in core cycles)",
>>>       > -        "MetricExpr": "L1D_PEND_MISS.PENDING / (
>>>      MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>>       > -        "MetricGroup": "Mem;MemoryBound;MemoryLat",
>>>       > -        "MetricName": "Load_Miss_Real_Latency",
>>>       > -        "PublicDescription": "Actual Average Latency for L1
>>>      data-cache miss demand load instructions (in core cycles). Latency
>>>      may be overestimated for multi-load instructions - e.g. repeat strings."
>>>       > +        "BriefDescription": "Number of Instructions per
>>>      non-speculative Branch Misprediction (JEClear) (lower number means
>>>      higher occurrence rate)",
>>>       > +        "MetricExpr": "INST_RETIRED.ANY /
>>>      BR_MISP_RETIRED.ALL_BRANCHES",
>>>       > +        "MetricGroup": "Bad;BadSpec;BrMispredicts",
>>>       > +        "MetricName": "IpMispredict"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Memory-Level-Parallelism (average
>>>      number of L1 miss demand load when there is at least one such miss.
>>>      Per-Logical Processor)",
>>>       > -        "MetricExpr": "L1D_PEND_MISS.PENDING /
>>>      L1D_PEND_MISS.PENDING_CYCLES",
>>>       > -        "MetricGroup": "Mem;MemoryBound;MemoryBW",
>>>       > -        "MetricName": "MLP"
>>>       > +        "BriefDescription": "Branch Misprediction Cost: Fraction
>>>      of TMA slots wasted per non-speculative branch misprediction
>>>      (retired JEClear)",
>>>       > +        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>>>      UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *
>>>      INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) + (4 *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>>>      CPU_CLK_UNHALTED.THREAD)) * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>>>      / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>>>      MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 *
>>>      CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) /
>>>      BR_MISP_RETIRED.ALL_BRANCHES",
>>>       > +        "MetricGroup": "Bad;BrMispredicts",
>>>       > +        "MetricName": "Branch_Misprediction_Cost"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Average data fill bandwidth to the
>>>      L1 data cache [GB / sec]",
>>>       > -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 /
>>>      duration_time",
>>>       > -        "MetricGroup": "Mem;MemoryBW",
>>>       > -        "MetricName": "L1D_Cache_Fill_BW"
>>>       > +        "BriefDescription": "Branch Misprediction Cost: Fraction
>>>      of TMA slots wasted per non-speculative branch misprediction
>>>      (retired JEClear)",
>>>       > +        "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * ((
>>>      UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( (
>>>      CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )
>>>      )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (
>>>      ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>>>      * (BR_MISP_RETIRED.ALL_BRANCHES * (12 * (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>>>      / CPU_CLK_UNHALTED.THREAD) / ( BR_MISP_RETIRED.ALL_BRANCHES +
>>>      MACHINE_CLEARS.COUNT + BACLEARS.ANY )) / #(4 *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( (
>>>      CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))
>>>      ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 +
>>>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))
>>>      / BR_MISP_RETIRED.ALL_BRANCHES",
>>>       > +        "MetricGroup": "Bad;BrMispredicts_SMT",
>>>       > +        "MetricName": "Branch_Misprediction_Cost_SMT"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Average data fill bandwidth to the
>>>      L2 cache [GB / sec]",
>>>       > -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 /
>>>      duration_time",
>>>       > -        "MetricGroup": "Mem;MemoryBW",
>>>       > -        "MetricName": "L2_Cache_Fill_BW"
>>>       > +        "BriefDescription": "Actual Average Latency for L1
>>>      data-cache miss demand load operations (in core cycles)",
>>>       > +        "MetricExpr": "L1D_PEND_MISS.PENDING / (
>>>      MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>>       > +        "MetricGroup": "Mem;MemoryBound;MemoryLat",
>>>       > +        "MetricName": "Load_Miss_Real_Latency"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Average per-core data fill
>>>      bandwidth to the L3 cache [GB / sec]",
>>>       > -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000
>>>      / duration_time",
>>>       > -        "MetricGroup": "Mem;MemoryBW",
>>>       > -        "MetricName": "L3_Cache_Fill_BW"
>>>       > +        "BriefDescription": "Memory-Level-Parallelism (average
>>>      number of L1 miss demand load when there is at least one such miss.
>>>      Per-Logical Processor)",
>>>       > +        "MetricExpr": "L1D_PEND_MISS.PENDING /
>>>      L1D_PEND_MISS.PENDING_CYCLES",
>>>       > +        "MetricGroup": "Mem;MemoryBound;MemoryBW",
>>>       > +        "MetricName": "MLP"
>>>       >       },
>>>       >       {
>>>       >           "BriefDescription": "L1 cache true misses per kilo
>>>      instruction for retired demand loads",
>>>       > @@ -306,13 +293,13 @@
>>>       >           "MetricName": "L2MPKI"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "L2 cache misses per kilo
>>>      instruction for all request types (including speculative)",
>>>       > +        "BriefDescription": "L2 cache ([RKL+] true) misses per
>>>      kilo instruction for all request types (including speculative)",
>>>       >           "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>>>       >           "MetricGroup": "Mem;CacheMisses;Offcore",
>>>       >           "MetricName": "L2MPKI_All"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "L2 cache misses per kilo
>>>      instruction for all demand loads  (including speculative)",
>>>       > +        "BriefDescription": "L2 cache ([RKL+] true) misses per
>>>      kilo instruction for all demand loads  (including speculative)",
>>>       >           "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS /
>>>      INST_RETIRED.ANY",
>>>       >           "MetricGroup": "Mem;CacheMisses",
>>>       >           "MetricName": "L2MPKI_Load"
>>>       > @@ -348,6 +335,48 @@
>>>       >           "MetricGroup": "Mem;MemoryTLB_SMT",
>>>       >           "MetricName": "Page_Walks_Utilization_SMT"
>>>       >       },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-core data fill
>>>      bandwidth to the L1 data cache [GB / sec]",
>>>       > +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 /
>>>      duration_time",
>>>       > +        "MetricGroup": "Mem;MemoryBW",
>>>       > +        "MetricName": "L1D_Cache_Fill_BW"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-core data fill
>>>      bandwidth to the L2 cache [GB / sec]",
>>>       > +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 /
>>>      duration_time",
>>>       > +        "MetricGroup": "Mem;MemoryBW",
>>>       > +        "MetricName": "L2_Cache_Fill_BW"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-core data fill
>>>      bandwidth to the L3 cache [GB / sec]",
>>>       > +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000
>>>      / duration_time",
>>>       > +        "MetricGroup": "Mem;MemoryBW",
>>>       > +        "MetricName": "L3_Cache_Fill_BW"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-thread data fill
>>>      bandwidth to the L1 data cache [GB / sec]",
>>>       > +        "MetricExpr": "(64 * L1D.REPLACEMENT / 1000000000 /
>>>      duration_time)",
>>>       > +        "MetricGroup": "Mem;MemoryBW",
>>>       > +        "MetricName": "L1D_Cache_Fill_BW_1T"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-thread data fill
>>>      bandwidth to the L2 cache [GB / sec]",
>>>       > +        "MetricExpr": "(64 * L2_LINES_IN.ALL / 1000000000 /
>>>      duration_time)",
>>>       > +        "MetricGroup": "Mem;MemoryBW",
>>>       > +        "MetricName": "L2_Cache_Fill_BW_1T"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-thread data fill
>>>      bandwidth to the L3 cache [GB / sec]",
>>>       > +        "MetricExpr": "(64 * LONGEST_LAT_CACHE.MISS / 1000000000
>>>      / duration_time)",
>>>       > +        "MetricGroup": "Mem;MemoryBW",
>>>       > +        "MetricName": "L3_Cache_Fill_BW_1T"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average per-thread data access
>>>      bandwidth to the L3 cache [GB / sec]",
>>>       > +        "MetricExpr": "0",
>>>       > +        "MetricGroup": "Mem;MemoryBW;Offcore",
>>>       > +        "MetricName": "L3_Cache_Access_BW_1T"
>>>       > +    },
>>>       >       {
>>>       >           "BriefDescription": "Average CPU Utilization",
>>>       >           "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>>       > @@ -364,7 +393,8 @@
>>>       >           "BriefDescription": "Giga Floating Point Operations Per
>>>      Second",
>>>       >           "MetricExpr": "( ( 1 * (
>>>      FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>>>      FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 *
>>>      FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * (
>>>      FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 *
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / 1000000000 ) /
>>>      duration_time",
>>>       >           "MetricGroup": "Cor;Flops;HPC",
>>>       > -        "MetricName": "GFLOPs"
>>>       > +        "MetricName": "GFLOPs",
>>>       > +        "PublicDescription": "Giga Floating Point Operations Per
>>>      Second. Aggregate across all supported options of: FP precisions,
>>>      scalar and vector instructions, vector-width and AMX engine."
>>>       >       },
>>>       >       {
>>>       >           "BriefDescription": "Average Frequency Utilization
>>>      relative nominal frequency",
>>>       > @@ -461,5 +491,439 @@
>>>       >           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@)
>>>      * 100",
>>>       >           "MetricGroup": "Power",
>>>       >           "MetricName": "C7_Pkg_Residency"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "CPU operating frequency (in GHz)",
>>>       > +        "MetricExpr": "( CPU_CLK_UNHALTED.THREAD /
>>>      CPU_CLK_UNHALTED.REF_TSC * #SYSTEM_TSC_FREQ ) / 1000000000",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "cpu_operating_frequency",
>>>       > +        "ScaleUnit": "1GHz"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Cycles per instruction retired;
>>>      indicating how much time each executed instruction took; in units of
>>>      cycles.",
>>>       > +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "cpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "The ratio of number of completed
>>>      memory load instructions to the total number completed instructions",
>>>       > +        "MetricExpr": "MEM_UOPS_RETIRED.ALL_LOADS /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "loads_per_instr",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "The ratio of number of completed
>>>      memory store instructions to the total number completed instructions",
>>>       > +        "MetricExpr": "MEM_UOPS_RETIRED.ALL_STORES /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "stores_per_instr",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of requests missing
>>>      L1 data cache (includes data+rfo w/ prefetches) to the total number
>>>      of completed instructions",
>>>       > +        "MetricExpr": "L1D.REPLACEMENT / INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "l1d_mpi_includes_data_plus_rfo_with_prefetches",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of demand load
>>>      requests hitting in L1 data cache to the total number of completed
>>>      instructions",
>>>       > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L1_HIT /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "l1d_demand_data_read_hits_per_instr",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of code read
>>>      requests missing in L1 instruction cache (includes prefetches) to
>>>      the total number of completed instructions",
>>>       > +        "MetricExpr": "L2_RQSTS.ALL_CODE_RD / INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "l1_i_code_read_misses_with_prefetches_per_instr",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of completed demand
>>>      load requests hitting in L2 cache to the total number of completed
>>>      instructions",
>>>       > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L2_HIT /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "l2_demand_data_read_hits_per_instr",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of requests missing
>>>      L2 cache (includes code+data+rfo w/ prefetches) to the total number
>>>      of completed instructions",
>>>       > +        "MetricExpr": "L2_LINES_IN.ALL / INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "l2_mpi_includes_code_plus_data_plus_rfo_with_prefetches",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of completed data
>>>      read request missing L2 cache to the total number of completed
>>>      instructions",
>>>       > +        "MetricExpr": "MEM_LOAD_UOPS_RETIRED.L2_MISS /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "l2_demand_data_read_mpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of code read
>>>      request missing L2 cache to the total number of completed instructions",
>>>       > +        "MetricExpr": "L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "l2_demand_code_mpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of data read
>>>      requests missing last level core cache (includes demand w/
>>>      prefetches) to the total number of completed instructions",
>>>       > +        "MetricExpr": "(
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x192@ ) /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "llc_data_read_mpi_demand_plus_prefetch",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of code read
>>>      requests missing last level core cache (includes demand w/
>>>      prefetches) to the total number of completed instructions",
>>>       > +        "MetricExpr": "(
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x181@ +
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x191@ ) /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "llc_code_read_mpi_demand_plus_prefetch",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average latency of a last level
>>>      cache (LLC) demand and prefetch data read miss (read memory access)
>>>      in nano seconds",
>>>       > +        "MetricExpr": "( 1000000000 * (
>>>      cbox@UNC_C_TOR_OCCUPANCY.MISS_OPCODE\\,filter_opc\\=0x182@ /
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
>>>      UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
>>>      ) ) ) * duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "llc_data_read_demand_plus_prefetch_miss_latency",
>>>       > +        "ScaleUnit": "1ns"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average latency of a last level
>>>      cache (LLC) demand and prefetch data read miss (read memory access)
>>>      addressed to local memory in nano seconds",
>>>       > +        "MetricExpr": "( 1000000000 * (
>>>      cbox@UNC_C_TOR_OCCUPANCY.MISS_LOCAL_OPCODE\\,filter_opc\\=0x182@ /
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
>>>      UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
>>>      ) ) ) * duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "llc_data_read_demand_plus_prefetch_miss_latency_for_local_requests",
>>>       > +        "ScaleUnit": "1ns"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Average latency of a last level
>>>      cache (LLC) demand and prefetch data read miss (read memory access)
>>>      addressed to remote memory in nano seconds",
>>>       > +        "MetricExpr": "( 1000000000 * (
>>>      cbox@UNC_C_TOR_OCCUPANCY.MISS_REMOTE_OPCODE\\,filter_opc\\=0x182@ /
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ ) / (
>>>      UNC_C_CLOCKTICKS / ( source_count(UNC_C_CLOCKTICKS) * #num_packages
>>>      ) ) ) * duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests",
>>>       > +        "ScaleUnit": "1ns"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of completed page
>>>      walks (for all page sizes) caused by a code fetch to the total
>>>      number of completed instructions. This implies it missed in the ITLB
>>>      (Instruction TLB) and further levels of TLB.",
>>>       > +        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "itlb_mpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of completed page
>>>      walks (for 2 megabyte and 4 megabyte page sizes) caused by a code
>>>      fetch to the total number of completed instructions. This implies it
>>>      missed in the Instruction Translation Lookaside Buffer (ITLB) and
>>>      further levels of TLB.",
>>>       > +        "MetricExpr": "ITLB_MISSES.WALK_COMPLETED_2M_4M /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "itlb_large_page_mpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of completed page
>>>      walks (for all page sizes) caused by demand data loads to the total
>>>      number of completed instructions. This implies it missed in the DTLB
>>>      and further levels of TLB.",
>>>       > +        "MetricExpr": "DTLB_LOAD_MISSES.WALK_COMPLETED /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "dtlb_load_mpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Ratio of number of completed page
>>>      walks (for all page sizes) caused by demand data stores to the total
>>>      number of completed instructions. This implies it missed in the DTLB
>>>      and further levels of TLB.",
>>>       > +        "MetricExpr": "DTLB_STORE_MISSES.WALK_COMPLETED /
>>>      INST_RETIRED.ANY",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "dtlb_store_mpi",
>>>       > +        "ScaleUnit": "1per_instr"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Memory read that miss the last
>>>      level cache (LLC) addressed to local DRAM as a percentage of total
>>>      memory read accesses, does not include LLC prefetches.",
>>>       > +        "MetricExpr": "100 *
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ / (
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ )",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "numa_percent_reads_addressed_to_local_dram",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Memory reads that miss the last
>>>      level cache (LLC) addressed to remote DRAM as a percentage of total
>>>      memory read accesses, does not include LLC prefetches.",
>>>       > +        "MetricExpr": "100 *
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ / (
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ +
>>>      cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x182@ )",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "numa_percent_reads_addressed_to_remote_dram",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Uncore operating frequency in GHz",
>>>       > +        "MetricExpr": "UNC_C_CLOCKTICKS / (
>>>      source_count(UNC_C_CLOCKTICKS) * #num_packages ) / 1000000000",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "uncore_frequency",
>>>       > +        "ScaleUnit": "1GHz"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Intel(R) Quick Path Interconnect
>>>      (QPI) data transmit bandwidth (MB/sec)",
>>>       > +        "MetricExpr": "( UNC_Q_TxL_FLITS_G0.DATA * 8 / 1000000)
>>>      / duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "qpi_data_transmit_bw_only_data",
>>>       > +        "ScaleUnit": "1MB/s"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "DDR memory read bandwidth (MB/sec)",
>>>       > +        "MetricExpr": "( UNC_M_CAS_COUNT.RD * 64 / 1000000) /
>>>      duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "memory_bandwidth_read",
>>>       > +        "ScaleUnit": "1MB/s"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "DDR memory write bandwidth (MB/sec)",
>>>       > +        "MetricExpr": "( UNC_M_CAS_COUNT.WR * 64 / 1000000) /
>>>      duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "memory_bandwidth_write",
>>>       > +        "ScaleUnit": "1MB/s"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "DDR memory bandwidth (MB/sec)",
>>>       > +        "MetricExpr": "(( UNC_M_CAS_COUNT.RD +
>>>      UNC_M_CAS_COUNT.WR ) * 64 / 1000000) / duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "memory_bandwidth_total",
>>>       > +        "ScaleUnit": "1MB/s"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Bandwidth of IO reads that are
>>>      initiated by end device controllers that are requesting memory from
>>>      the CPU.",
>>>       > +        "MetricExpr": "(
>>>      cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x19e@ * 64 / 1000000)
>>>      / duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "io_bandwidth_read",
>>>       > +        "ScaleUnit": "1MB/s"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Bandwidth of IO writes that are
>>>      initiated by end device controllers that are writing memory to the
>>>      CPU.",
>>>       > +        "MetricExpr": "((
>>>      cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x1c8\\,filter_tid\\=0x3e@
>>>      +
>>>      cbox@UNC_C_TOR_INSERTS.OPCODE\\,filter_opc\\=0x180\\,filter_tid\\=0x3e@
>>>      ) * 64 / 1000000) / duration_time",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName": "io_bandwidth_write",
>>>       > +        "ScaleUnit": "1MB/s"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Uops delivered from decoded
>>>      instruction cache (decoded stream buffer or DSB) as a percent of
>>>      total uops delivered to Instruction Decode Queue",
>>>       > +        "MetricExpr": "100 * ( IDQ.DSB_UOPS / UOPS_ISSUED.ANY )",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "percent_uops_delivered_from_decoded_icache_dsb",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Uops delivered from legacy decode
>>>      pipeline (Micro-instruction Translation Engine or MITE) as a percent
>>>      of total uops delivered to Instruction Decode Queue",
>>>       > +        "MetricExpr": "100 * ( IDQ.MITE_UOPS / UOPS_ISSUED.ANY )",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "percent_uops_delivered_from_legacy_decode_pipeline_mite",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Uops delivered from microcode
>>>      sequencer (MS) as a percent of total uops delivered to Instruction
>>>      Decode Queue",
>>>       > +        "MetricExpr": "100 * ( IDQ.MS_UOPS / UOPS_ISSUED.ANY )",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "percent_uops_delivered_from_microcode_sequencer_ms",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "Uops delivered from loop stream
>>>      detector(LSD) as a percent of total uops delivered to Instruction
>>>      Decode Queue",
>>>       > +        "MetricExpr": "100 * ( LSD.UOPS / UOPS_ISSUED.ANY )",
>>>       > +        "MetricGroup": "",
>>>       > +        "MetricName":
>>>      "percent_uops_delivered_from_loop_stream_detector_lsd",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This category represents fraction
>>>      of slots where the processor's Frontend undersupplies its Backend.
>>>      Frontend denotes the first part of the processor core responsible to
>>>      fetch operations that are executed later on by the Backend part.
>>>      Within the Frontend; a branch predictor predicts the next address to
>>>      fetch; cache-lines are fetched from the memory subsystem; parsed
>>>      into instructions; and lastly decoded into micro-operations (uops).
>>>      Ideally the Frontend can issue Machine_Width uops every cycle to the
>>>      Backend. Frontend Bound denotes unutilized issue-slots when there is
>>>      no Backend stall; i.e. bubbles where Frontend delivered no uops
>>>      while Backend could have accepted them. For example; stalls due to
>>>      instruction-cache misses would be categorized under Frontend Bound.",
>>>       > +        "MetricExpr": "100 * ( IDQ_UOPS_NOT_DELIVERED.CORE / ( (
>>>      4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) )",
>>>       > +        "MetricGroup": "TmaL1;PGO",
>>>       > +        "MetricName": "tma_frontend_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots the CPU was stalled due to Frontend latency issues.  For
>>>      example; instruction-cache misses; iTLB misses or fetch stalls after
>>>      a branch misprediction are categorized under Frontend Latency. In
>>>      such cases; the Frontend eventually delivers no uops for some period.",
>>>       > +        "MetricExpr": "100 * ( ( 4 ) *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) )",
>>>       > +        "MetricGroup":
>>>      "Frontend;TmaL2;m_tma_frontend_bound_percent",
>>>       > +        "MetricName": "tma_fetch_latency_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      cycles the CPU was stalled due to instruction cache misses.",
>>>       > +        "MetricExpr": "100 * ( ICACHE.IFDATA_STALL / (
>>>      CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "BigFoot;FetchLat;IcMiss;TmaL3;m_tma_fetch_latency_percent",
>>>       > +        "MetricName": "tma_icache_misses_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      cycles the CPU was stalled due to Instruction TLB (ITLB) misses.",
>>>       > +        "MetricExpr": "100 * ( ( 14 * ITLB_MISSES.STLB_HIT +
>>>      cpu@ITLB_MISSES.WALK_DURATION\\,cmask\\=0x1@ + 7 *
>>>      ITLB_MISSES.WALK_COMPLETED ) / ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "BigFoot;FetchLat;MemoryTLB;TmaL3;m_tma_fetch_latency_percent",
>>>       > +        "MetricName": "tma_itlb_misses_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      cycles the CPU was stalled due to Branch Resteers. Branch Resteers
>>>      estimates the Frontend delay in fetching operations from corrected
>>>      path; following all sorts of miss-predicted branches. For example;
>>>      branchy code with lots of miss-predictions might get categorized
>>>      under Branch Resteers. Note the value of this node may overlap with
>>>      its siblings.",
>>>       > +        "MetricExpr": "100 * ( ( 12 ) * (
>>>      BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY )
>>>      / ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
>>>       > +        "MetricName": "tma_branch_resteers_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      cycles the CPU was stalled due to switches from DSB to MITE
>>>      pipelines. The DSB (decoded i-cache) is a Uop Cache where the
>>>      front-end directly delivers Uops (micro operations) avoiding heavy
>>>      x86 decoding. The DSB pipeline has shorter latency and delivered
>>>      higher bandwidth than the MITE (legacy instruction decode pipeline).
>>>      Switching between the two pipelines can cause penalties hence this
>>>      metric measures the exposed penalty.",
>>>       > +        "MetricExpr": "100 * ( DSB2MITE_SWITCHES.PENALTY_CYCLES
>>>      / ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "DSBmiss;FetchLat;TmaL3;m_tma_fetch_latency_percent",
>>>       > +        "MetricName": "tma_dsb_switches_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using
>>>      proper compiler flags or Intel Compiler by default will certainly
>>>      avoid this. #Link: Optimization Guide about LCP BKMs.",
>>>       > +        "MetricExpr": "100 * ( ILD_STALL.LCP / (
>>>      CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup": "FetchLat;TmaL3;m_tma_fetch_latency_percent",
>>>       > +        "MetricName": "tma_lcp_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates the fraction
>>>      of cycles when the CPU was stalled due to switches of uop delivery
>>>      to the Microcode Sequencer (MS). Commonly used instructions are
>>>      optimized for delivery by the DSB (decoded i-cache) or MITE (legacy
>>>      instruction decode) pipelines. Certain operations cannot be handled
>>>      natively by the execution pipeline; and must be performed by
>>>      microcode (small programs injected into the execution stream).
>>>      Switching to the MS too often can negatively impact performance. The
>>>      MS is designated to deliver long uop flows required by CISC
>>>      instructions like CPUID; or uncommon conditions like Floating Point
>>>      Assists when dealing with Denormals.",
>>>       > +        "MetricExpr": "100 * ( ( 2 ) * IDQ.MS_SWITCHES / (
>>>      CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "FetchLat;MicroSeq;TmaL3;m_tma_fetch_latency_percent",
>>>       > +        "MetricName": "tma_ms_switches_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots the CPU was stalled due to Frontend bandwidth issues.  For
>>>      example; inefficiencies at the instruction decoders; or restrictions
>>>      for caching in the DSB (decoded uops cache) are categorized under
>>>      Fetch Bandwidth. In such cases; the Frontend typically delivers
>>>      suboptimal amount of uops to the Backend.",
>>>       > +        "MetricExpr": "100 * ( ( IDQ_UOPS_NOT_DELIVERED.CORE / (
>>>      ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( 4 ) *
>>>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
>>>       > +        "MetricGroup":
>>>      "FetchBW;Frontend;TmaL2;m_tma_frontend_bound_percent",
>>>       > +        "MetricName": "tma_fetch_bandwidth_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents Core
>>>      fraction of cycles in which CPU was likely limited due to the MITE
>>>      pipeline (the legacy decode pipeline). This pipeline is used for
>>>      code that was not pre-cached in the DSB or LSD. For example;
>>>      inefficiencies due to asymmetric decoders; use of long immediate or
>>>      LCP can manifest as MITE fetch bandwidth bottleneck.",
>>>       > +        "MetricExpr": "100 * ( ( IDQ.ALL_MITE_CYCLES_ANY_UOPS -
>>>      IDQ.ALL_MITE_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 )
>>>      if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) / 2 )",
>>>       > +        "MetricGroup":
>>>      "DSBmiss;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
>>>       > +        "MetricName": "tma_mite_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents Core
>>>      fraction of cycles in which CPU was likely limited due to DSB
>>>      (decoded uop cache) fetch pipeline.  For example; inefficient
>>>      utilization of the DSB cache structure or bank conflict when reading
>>>      from it; are categorized here.",
>>>       > +        "MetricExpr": "100 * ( ( IDQ.ALL_DSB_CYCLES_ANY_UOPS -
>>>      IDQ.ALL_DSB_CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 )
>>>      if #SMT_on else ( CPU_CLK_UNHALTED.THREAD ) ) / 2 )",
>>>       > +        "MetricGroup":
>>>      "DSB;FetchBW;TmaL3;m_tma_fetch_bandwidth_percent",
>>>       > +        "MetricName": "tma_dsb_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This category represents fraction
>>>      of slots wasted due to incorrect speculations. This include slots
>>>      used to issue uops that do not eventually get retired and slots for
>>>      which the issue-pipeline was blocked due to recovery from earlier
>>>      incorrect speculation. For example; wasted work due to
>>>      miss-predicted branches are categorized under Bad Speculation
>>>      category. Incorrect data speculation followed by Memory Ordering
>>>      Nukes is another example.",
>>>       > +        "MetricExpr": "100 * ( ( UOPS_ISSUED.ANY - (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) )",
>>>       > +        "MetricGroup": "TmaL1",
>>>       > +        "MetricName": "tma_bad_speculation_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots the CPU has wasted due to Branch Misprediction.  These slots
>>>      are either wasted by uops fetched from an incorrectly speculated
>>>      program path; or stalls when the out-of-order part of the machine
>>>      needs to recover its state from a speculative path.",
>>>       > +        "MetricExpr": "100 * ( ( BR_MISP_RETIRED.ALL_BRANCHES /
>>>      ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( (
>>>      UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
>>>       > +        "MetricGroup":
>>>      "BadSpec;BrMispredicts;TmaL2;m_tma_bad_speculation_percent",
>>>       > +        "MetricName": "tma_branch_mispredicts_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots the CPU has wasted due to Machine Clears.  These slots are
>>>      either wasted by uops fetched prior to the clear; or stalls the
>>>      out-of-order portion of the machine needs to recover its state after
>>>      the clear. For example; this can happen due to memory ordering Nukes
>>>      (e.g. Memory Disambiguation) or Self-Modifying-Code (SMC) nukes.",
>>>       > +        "MetricExpr": "100 * ( ( ( UOPS_ISSUED.ANY - (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( BR_MISP_RETIRED.ALL_BRANCHES /
>>>      ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT ) ) * ( (
>>>      UOPS_ISSUED.ANY - ( UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
>>>       > +        "MetricGroup":
>>>      "BadSpec;MachineClears;TmaL2;m_tma_bad_speculation_percent",
>>>       > +        "MetricName": "tma_machine_clears_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This category represents fraction
>>>      of slots where no uops are being delivered due to a lack of required
>>>      resources for accepting new uops in the Backend. Backend is the
>>>      portion of the processor core where the out-of-order scheduler
>>>      dispatches ready uops into their respective execution units; and
>>>      once completed these uops get retired according to program order.
>>>      For example; stalls due to data-cache misses or stalls due to the
>>>      divider unit being overloaded are both categorized under Backend
>>>      Bound. Backend Bound is further divided into two main categories:
>>>      Memory Bound and Core Bound.",
>>>       > +        "MetricExpr": "100 * ( 1 - ( (
>>>      IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>>>      ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
>>>       > +        "MetricGroup": "TmaL1",
>>>       > +        "MetricName": "tma_backend_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots the Memory subsystem within the Backend was a bottleneck.
>>>      Memory Bound estimates fraction of slots where pipeline is likely
>>>      stalled due to demand load or store instructions. This accounts
>>>      mainly for (1) non-completed in-flight memory demand loads which
>>>      coincides with execution units starvation; in addition to (2) cases
>>>      where stores could impose backpressure on the pipeline when many of
>>>      them get buffered at the same time (less common out of the two).",
>>>       > +        "MetricExpr": "100 * ( ( ( CYCLE_ACTIVITY.STALLS_MEM_ANY
>>>      + RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) / ( (
>>>      CYCLE_ACTIVITY.STALLS_TOTAL + UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
>>>      UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
>>>      CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
>>>      UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
>>>      ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
>>>      ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
>>>      RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) ) * ( 1 - ( (
>>>      IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>>>      ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) )",
>>>       > +        "MetricGroup": "Backend;TmaL2;m_tma_backend_bound_percent",
>>>       > +        "MetricName": "tma_memory_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates how often the
>>>      CPU was stalled without loads missing the L1 data cache.  The L1
>>>      data cache typically has the shortest latency.  However; in certain
>>>      cases like loads blocked on older stores; a load might suffer due to
>>>      high latency even though it is being satisfied by the L1. Another
>>>      example is loads who miss in the TLB. These cases are characterized
>>>      by execution unit stalls; while some non-completed demand load lives
>>>      in the machine without having that demand load missing the L1 cache.",
>>>       > +        "MetricExpr": "100 * ( max( (
>>>      CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / (
>>>      CPU_CLK_UNHALTED.THREAD ) , 0 ) )",
>>>       > +        "MetricGroup":
>>>      "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>>>       > +        "MetricName": "tma_l1_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates how often the
>>>      CPU was stalled due to L2 cache accesses by loads.  Avoiding cache
>>>      misses (i.e. L1 misses/L2 hits) can improve the latency and increase
>>>      performance.",
>>>       > +        "MetricExpr": "100 * ( ( CYCLE_ACTIVITY.STALLS_L1D_MISS
>>>      - CYCLE_ACTIVITY.STALLS_L2_MISS ) / ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>>>       > +        "MetricName": "tma_l2_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates how often the
>>>      CPU was stalled due to loads accesses to L3 cache or contended with
>>>      a sibling Core.  Avoiding cache misses (i.e. L2 misses/L3 hits) can
>>>      improve the latency and increase performance.",
>>>       > +        "MetricExpr": "100 * ( ( MEM_LOAD_UOPS_RETIRED.L3_HIT /
>>>      ( MEM_LOAD_UOPS_RETIRED.L3_HIT + ( 7 ) *
>>>      MEM_LOAD_UOPS_RETIRED.L3_MISS ) ) * CYCLE_ACTIVITY.STALLS_L2_MISS /
>>>      ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "CacheMisses;MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>>>       > +        "MetricName": "tma_l3_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates how often the
>>>      CPU was stalled on accesses to external memory (DRAM) by loads.
>>>      Better caching can improve the latency and increase performance.",
>>>       > +        "MetricExpr": "100 * ( min( ( ( 1 - (
>>>      MEM_LOAD_UOPS_RETIRED.L3_HIT / ( MEM_LOAD_UOPS_RETIRED.L3_HIT + ( 7
>>>      ) * MEM_LOAD_UOPS_RETIRED.L3_MISS ) ) ) *
>>>      CYCLE_ACTIVITY.STALLS_L2_MISS / ( CPU_CLK_UNHALTED.THREAD ) ) , ( 1
>>>      ) ) )",
>>>       > +        "MetricGroup":
>>>      "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>>>       > +        "MetricName": "tma_dram_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates how often CPU
>>>      was stalled  due to RFO store memory accesses; RFO store issue a
>>>      read-for-ownership request before the write. Even though store
>>>      accesses do not typically stall out-of-order CPUs; there are few
>>>      cases where stores can lead to actual stalls. This metric will be
>>>      flagged should RFO stores be a bottleneck.",
>>>       > +        "MetricExpr": "100 * ( RESOURCE_STALLS.SB
>>>      <http://RESOURCE_STALLS.SB> / ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup":
>>>      "MemoryBound;TmaL3mem;TmaL3;m_tma_memory_bound_percent",
>>>       > +        "MetricName": "tma_store_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots where Core non-memory issues were of a bottleneck.  Shortage
>>>      in hardware compute resources; or dependencies in software's
>>>      instructions are both categorized under Core Bound. Hence it may
>>>      indicate the machine ran out of an out-of-order resource; certain
>>>      execution units are overloaded or dependencies in program's data- or
>>>      instruction-flow are limiting the performance (e.g. FP-chained
>>>      long-latency arithmetic operations).",
>>>       > +        "MetricExpr": "100 * ( ( 1 - ( (
>>>      IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>>>      ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) - ( ( (
>>>      CYCLE_ACTIVITY.STALLS_MEM_ANY + RESOURCE_STALLS.SB
>>>      <http://RESOURCE_STALLS.SB> ) / ( ( CYCLE_ACTIVITY.STALLS_TOTAL +
>>>      UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
>>>      UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
>>>      CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
>>>      UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
>>>      ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
>>>      ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
>>>      RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) ) * ( 1 - ( (
>>>      IDQ_UOPS_NOT_DELIVERED.CORE / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_ISSUED.ANY - (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) + ( 4 ) * ( (
>>>      INT_MISC.RECOVERY_CYCLES_ANY / 2 ) if #SMT_on else
>>>      INT_MISC.RECOVERY_CYCLES ) ) / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) + ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>>>      ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) ) ) ) )",
>>>       > +        "MetricGroup":
>>>      "Backend;TmaL2;Compute;m_tma_backend_bound_percent",
>>>       > +        "MetricName": "tma_core_bound_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      cycles where the Divider unit was active. Divide and square root
>>>      instructions are performed by the Divider unit and can take
>>>      considerably longer latency than integer or Floating Point addition;
>>>      subtraction; or multiplication.",
>>>       > +        "MetricExpr": "100 * ( ARITH.FPU_DIV_ACTIVE / ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) )",
>>>       > +        "MetricGroup": "TmaL3;m_tma_core_bound_percent",
>>>       > +        "MetricName": "tma_divider_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric estimates fraction of
>>>      cycles the CPU performance was potentially limited due to Core
>>>      computation issues (non divider-related).  Two distinct categories
>>>      can be attributed into this metric: (1) heavy data-dependency among
>>>      contiguous instructions would manifest in this metric - such cases
>>>      are often referred to as low Instruction Level Parallelism (ILP).
>>>      (2) Contention on some hardware execution unit other than Divider.
>>>      For example; when there are too many multiply operations.",
>>>       > +        "MetricExpr": "100 * ( ( ( ( CYCLE_ACTIVITY.STALLS_TOTAL
>>>      + UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC - (
>>>      UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if ( ( INST_RETIRED.ANY / (
>>>      CPU_CLK_UNHALTED.THREAD ) ) > 1.8 ) else
>>>      UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC ) - ( RS_EVENTS.EMPTY_CYCLES if
>>>      ( ( ( 4 ) * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / ( ( 4
>>>      ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) > 0.1 ) else 0 ) +
>>>      RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> ) ) -
>>>      RESOURCE_STALLS.SB <http://RESOURCE_STALLS.SB> -
>>>      CYCLE_ACTIVITY.STALLS_MEM_ANY ) / ( CPU_CLK_UNHALTED.THREAD ) )",
>>>       > +        "MetricGroup": "PortsUtil;TmaL3;m_tma_core_bound_percent",
>>>       > +        "MetricName": "tma_ports_utilization_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This category represents fraction
>>>      of slots utilized by useful work i.e. issued uops that eventually
>>>      get retired. Ideally; all pipeline slots would be attributed to the
>>>      Retiring category.  Retiring of 100% would indicate the maximum
>>>      Pipeline_Width throughput was achieved.  Maximizing Retiring
>>>      typically increases the Instructions-per-cycle (see IPC metric).
>>>      Note that a high Retiring value does not necessary mean there is no
>>>      room for more performance.  For example; Heavy-operations or
>>>      Microcode Assists are categorized under Retiring. They often
>>>      indicate suboptimal performance and can often be optimized or
>>>      avoided. ",
>>>       > +        "MetricExpr": "100 * ( ( UOPS_RETIRED.RETIRE_SLOTS ) / (
>>>      ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) )",
>>>       > +        "MetricGroup": "TmaL1",
>>>       > +        "MetricName": "tma_retiring_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots where the CPU was retiring light-weight operations --
>>>      instructions that require no more than one uop (micro-operation).
>>>      This correlates with total number of instructions used by the
>>>      program. A uops-per-instruction (see UPI metric) ratio of 1 or less
>>>      should be expected for decently optimized software running on Intel
>>>      Core/Xeon products. While this often indicates efficient X86
>>>      instructions were executed; high value does not necessarily mean
>>>      better performance cannot be achieved.",
>>>       > +        "MetricExpr": "100 * ( ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>>>      ( ( 4 ) * ( ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) - ( ( ( ( UOPS_RETIRED.RETIRE_SLOTS
>>>      ) / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) ) )",
>>>       > +        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
>>>       > +        "MetricName": "tma_light_operations_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents overall
>>>      arithmetic floating-point (FP) operations fraction the CPU has
>>>      executed (retired). Note this metric's value may exceed its parent
>>>      due to use of \"Uops\" CountDomain and FMA double-counting.",
>>>       > +        "MetricExpr": "100 * ( ( INST_RETIRED.X87 * ( (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) / INST_RETIRED.ANY ) / (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) ) + ( (
>>>      FP_ARITH_INST_RETIRED.SCALAR_SINGLE +
>>>      FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) / ( UOPS_RETIRED.RETIRE_SLOTS
>>>      ) ) + ( min( ( ( FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE +
>>>      FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE +
>>>      FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE ) / (
>>>      UOPS_RETIRED.RETIRE_SLOTS ) ) , ( 1 ) ) ) )",
>>>       > +        "MetricGroup": "HPC;TmaL3;m_tma_light_operations_percent",
>>>       > +        "MetricName": "tma_fp_arith_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots where the CPU was retiring heavy-weight operations --
>>>      instructions that require two or more uops or microcoded sequences.
>>>      This highly-correlates with the uop length of these
>>>      instructions/sequences.",
>>>       > +        "MetricExpr": "100 * ( ( ( ( UOPS_RETIRED.RETIRE_SLOTS )
>>>      / UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) ) )",
>>>       > +        "MetricGroup": "Retire;TmaL2;m_tma_retiring_percent",
>>>       > +        "MetricName": "tma_heavy_operations_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       > +    },
>>>       > +    {
>>>       > +        "BriefDescription": "This metric represents fraction of
>>>      slots the CPU was retiring uops fetched by the Microcode Sequencer
>>>      (MS) unit.  The MS is used for CISC instructions not supported by
>>>      the default decoders (like repeat move strings; or CPUID); or by
>>>      microcode assists used to address some operation modes (like in
>>>      Floating Point assists). These cases can often be avoided.",
>>>       > +        "MetricExpr": "100 * ( ( ( UOPS_RETIRED.RETIRE_SLOTS ) /
>>>      UOPS_ISSUED.ANY ) * IDQ.MS_UOPS / ( ( 4 ) * ( (
>>>      CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else (
>>>      CPU_CLK_UNHALTED.THREAD ) ) ) )",
>>>       > +        "MetricGroup":
>>>      "MicroSeq;TmaL3;m_tma_heavy_operations_percent",
>>>       > +        "MetricName": "tma_microcode_sequencer_percent",
>>>       > +        "ScaleUnit": "1%"
>>>       >       }
>>>       >   ]
>>>       > diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>>>      b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>>>       > index 127abe08362f..2efc4c0ee740 100644
>>>       > --- a/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>>>       > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/cache.json
>>>       > @@ -814,9 +814,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_CODE_RD.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x04003C0244",
>>>       > +        "MSRValue": "0x4003C0244",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch code
>>>      reads hit in the L3 and the snoops to sibling cores hit in either
>>>      E/S state and the line is not forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -829,7 +828,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x10003C0091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads hit in the L3 and the snoop to one of the sibling cores hits
>>>      the line in M state and the line is forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -840,9 +838,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x04003C0091",
>>>       > +        "MSRValue": "0x4003C0091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads hit in the L3 and the snoops to sibling cores hit in either
>>>      E/S state and the line is not forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -855,7 +852,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x10003C07F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) hit in the L3 and the snoop to one of the
>>>      sibling cores hits the line in M state and the line is forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -866,9 +862,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_READS.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x04003C07F7",
>>>       > +        "MSRValue": "0x4003C07F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) hit in the L3 and the snoops to sibling cores
>>>      hit in either E/S state and the line is not forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -881,7 +876,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3F803C8FFF",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all requests hit in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -894,7 +888,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x10003C0122",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch RFOs
>>>      hit in the L3 and the snoop to one of the sibling cores hits the
>>>      line in M state and the line is forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -905,9 +898,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HIT_OTHER_CORE_NO_FWD",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x04003C0122",
>>>       > +        "MSRValue": "0x4003C0122",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch RFOs
>>>      hit in the L3 and the snoops to sibling cores hit in either E/S
>>>      state and the line is not forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -920,7 +912,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3F803C0002",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand data writes
>>>      (RFOs) hit in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -933,7 +924,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x10003C0002",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand data writes
>>>      (RFOs) hit in the L3 and the snoop to one of the sibling cores hits
>>>      the line in M state and the line is forwarded",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -946,7 +936,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3F803C0200",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts prefetch (that bring data
>>>      to LLC only) code reads hit in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -959,7 +948,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3F803C0100",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all prefetch (that bring
>>>      data to LLC only) RFOs hit in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -973,4 +961,4 @@
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x10"
>>>       >       }
>>>       > -]
>>>       > \ No newline at end of file
>>>       > +]
>>>       > diff --git
>>>      a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>>>      b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>>>       > index 9ad37dddb354..93bbc8600321 100644
>>>       > --- a/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>>>       > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/floating-point.json
>>>       > @@ -5,6 +5,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      128-bit packed double precision floating-point instructions retired;
>>>      some instructions will count twice as noted below.  Each count
>>>      represents 2 computation operations, one for each element.  Applies
>>>      to SSE* and AVX* packed double precision floating-point
>>>      instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP
>>>      FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they
>>>      perform 2 calculations per element. The DAZ and FTZ flags in the
>>>      MXCSR register need to be set when using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x4"
>>>       >       },
>>>       > @@ -14,6 +15,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      128-bit packed single precision floating-point instructions retired;
>>>      some instructions will count twice as noted below.  Each count
>>>      represents 4 computation operations, one for each element.  Applies
>>>      to SSE* and AVX* packed single precision floating-point
>>>      instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT
>>>      RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice
>>>      as they perform 2 calculations per element. The DAZ and FTZ flags in
>>>      the MXCSR register need to be set when using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x8"
>>>       >       },
>>>       > @@ -23,6 +25,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      256-bit packed double precision floating-point instructions retired;
>>>      some instructions will count twice as noted below.  Each count
>>>      represents 4 computation operations, one for each element.  Applies
>>>      to SSE* and AVX* packed double precision floating-point
>>>      instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT
>>>      FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform
>>>      2 calculations per element. The DAZ and FTZ flags in the MXCSR
>>>      register need to be set when using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x10"
>>>       >       },
>>>       > @@ -32,6 +35,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      256-bit packed single precision floating-point instructions retired;
>>>      some instructions will count twice as noted below.  Each count
>>>      represents 8 computation operations, one for each element.  Applies
>>>      to SSE* and AVX* packed single precision floating-point
>>>      instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT
>>>      RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice
>>>      as they perform 2 calculations per element. The DAZ and FTZ flags in
>>>      the MXCSR register need to be set when using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x20"
>>>       >       },
>>>       > @@ -59,6 +63,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      scalar single precision and double precision floating-point
>>>      instructions retired; some instructions will count twice as noted
>>>      below.  Each count represents 1 computational operation. Applies to
>>>      SSE* and AVX* scalar single precision floating-point instructions:
>>>      ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB
>>>      instructions count twice as they perform 2 calculations per element.
>>>      The DAZ and FTZ flags in the MXCSR register need to be set when
>>>      using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x3"
>>>       >       },
>>>       > @@ -68,6 +73,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      scalar double precision floating-point instructions retired; some
>>>      instructions will count twice as noted below.  Each count represents
>>>      1 computational operation. Applies to SSE* and AVX* scalar double
>>>      precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT
>>>      FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform
>>>      2 calculations per element. The DAZ and FTZ flags in the MXCSR
>>>      register need to be set when using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -77,6 +83,7 @@
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xc7",
>>>       >           "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
>>>       > +        "PublicDescription": "Number of SSE/AVX computational
>>>      scalar single precision floating-point instructions retired; some
>>>      instructions will count twice as noted below.  Each count represents
>>>      1 computational operation. Applies to SSE* and AVX* scalar single
>>>      precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT
>>>      RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as
>>>      they perform 2 calculations per element. The DAZ and FTZ flags in
>>>      the MXCSR register need to be set when using these events.",
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x2"
>>>       >       },
>>>       > @@ -190,4 +197,4 @@
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x3"
>>>       >       }
>>>       > -]
>>>       > \ No newline at end of file
>>>       > +]
>>>       > diff --git
>>>      a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>>>      b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>>>       > index f0bcb945ff76..37ce8034b2ed 100644
>>>       > --- a/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>>>       > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/frontend.json
>>>       > @@ -292,4 +292,4 @@
>>>       >           "SampleAfterValue": "2000003",
>>>       >           "UMask": "0x1"
>>>       >       }
>>>       > -]
>>>       > \ No newline at end of file
>>>       > +]
>>>       > diff --git
>>>      a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>>>      b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>>>       > index cce993b197e3..545f61f691b9 100644
>>>       > --- a/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>>>       > +++ b/tools/perf/pmu-events/arch/x86/broadwellx/memory.json
>>>       > @@ -247,7 +247,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3FBFC00244",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch code
>>>      reads miss in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -258,9 +257,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_CODE_RD.LLC_MISS.LOCAL_DRAM",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x0604000244",
>>>       > +        "MSRValue": "0x604000244",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch code
>>>      reads miss the L3 and the data is returned from local dram",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -273,7 +271,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3FBFC00091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads miss in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -284,9 +281,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.LOCAL_DRAM",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x0604000091",
>>>       > +        "MSRValue": "0x604000091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads miss the L3 and the data is returned from local dram",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -297,9 +293,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.REMOTE_DRAM",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x063BC00091",
>>>       > +        "MSRValue": "0x63BC00091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads miss the L3 and the data is returned from remote dram",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -312,7 +307,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x103FC00091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads miss the L3 and the modified data is transferred from remote
>>>      cache",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -323,9 +317,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_DATA_RD.LLC_MISS.REMOTE_HIT_FORWARD",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x087FC00091",
>>>       > +        "MSRValue": "0x87FC00091",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch data
>>>      reads miss the L3 and clean or shared data is transferred from
>>>      remote cache",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -338,20 +331,18 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3FBFC007F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) miss in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       >       {
>>>       > -        "BriefDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch)miss the L3 and the data is returned from local
>>>      dram",
>>>       > +        "BriefDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) miss the L3 and the data is returned from local
>>>      dram",
>>>       >           "Counter": "0,1,2,3",
>>>       >           "CounterHTOff": "0,1,2,3",
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.LOCAL_DRAM",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x06040007F7",
>>>       > +        "MSRValue": "0x6040007F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch)miss the L3 and the data is returned from local
>>>      dram",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -362,9 +353,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_DRAM",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x063BC007F7",
>>>       > +        "MSRValue": "0x63BC007F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) miss the L3 and the data is returned from remote
>>>      dram",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -377,7 +367,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x103FC007F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) miss the L3 and the modified data is transferred
>>>      from remote cache",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -388,9 +377,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_READS.LLC_MISS.REMOTE_HIT_FORWARD",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x087FC007F7",
>>>       > +        "MSRValue": "0x87FC007F7",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all data/code/rfo reads
>>>      (demand & prefetch) miss the L3 and clean or shared data is
>>>      transferred from remote cache",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -403,7 +391,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3FBFC08FFF",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all requests miss in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -416,7 +403,6 @@
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       >           "MSRValue": "0x3FBFC00122",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & prefetch RFOs
>>>      miss in the L3",
>>>       >           "SampleAfterValue": "100003",
>>>       >           "UMask": "0x1"
>>>       >       },
>>>       > @@ -427,9 +413,8 @@
>>>       >           "EventCode": "0xB7, 0xBB",
>>>       >           "EventName":
>>>      "OFFCORE_RESPONSE.ALL_RFO.LLC_MISS.LOCAL_DRAM",
>>>       >           "MSRIndex": "0x1a6,0x1a7",
>>>       > -        "MSRValue": "0x0604000122",
>>>       > +        "MSRValue": "0x604000122",
>>>       >           "Offcore": "1",
>>>       > -        "PublicDescription": "Counts all demand & pref
>>>
>>
>> --
>> Zhengjun Xing

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 00/31] Add generated latest Intel events and metrics
  2022-07-24 19:08   ` Ian Rogers
@ 2022-07-27  6:48     ` Sedat Dilek
  2022-07-27 22:30       ` Ian Rogers
  0 siblings, 1 reply; 25+ messages in thread
From: Sedat Dilek @ 2022-07-27  6:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Stephane Eranian

On Sun, Jul 24, 2022 at 9:08 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sat, Jul 23, 2022 at 10:52 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Sat, Jul 23, 2022 at 12:32 AM Ian Rogers <irogers@google.com> wrote:
> > >
> > > The goal of this patch series is to align the json events for Intel
> > > platforms with those generated by:
> > > https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> > > This script takes the latest event json and TMA metrics from:
> > > https://download.01.org/perfmon/ and adds to these metrics, in
> > > particular uncore ones, from: https://github.com/intel/perfmon-metrics
> > > The cpu_operating_frequency metric assumes the presence of the
> > > system_tsc_freq literal posted/reviewed in:
> > > https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
> > >

Hi Ian,

thanks for your feedback and sorry for the delay.

In the meantime Arnaldo's Git tree/branches changed.
The above required patchset was integrated.
Some other perf-tools patches were included.

You sent a v2 - unsure if this fits [1] and what your plans are.

Regarding the Sandybridge updates - they seem to be marginal.

We will have - hopefully - Linux v5.19 FINAL this Sunday.
And LLVM project opened release/15.x branch and version 15.0.0-rc1
will be released this week (hope so).
My next testings will focus on these two events.

Anyway, I am still interested in testing your work.

Please let me know.

Thanks.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core
[2] https://github.com/llvm/llvm-project/commits/release/15.x

> > > Some fixes were needed to the script for generating the json and are
> > > contained in this pull request:
> > > https://github.com/intel/event-converter-for-linux-perf/pull/15
> > >
> > > The json files were first downloaded before being used to generate the
> > > perf json files. This fixes non-ascii characters for (R) and (TM) in
> > > the source json files. This can be reproduced with:
> > > $ download_and_gen.py --hermetic-download --outdir data
> > > $ download_and_gen.py --url=file://`pwd`/data/01 --metrics-url=file://`pwd`/data/github
> > >
> > > A minor correction is made in the generated json of:
> > > tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> > > changing "\\Inbound\\" to just "Inbound" to avoid compilation errors
> > > caused by \I.
> > >
> > > The elkhartlake metrics file is basic and not generated by scripts. It
> > > is retained here although it causes a difference from the generated
> > > files.
> > >
> > > The mapfile.csv is the third and final difference from the generated
> > > version due to a bug in 01.org's models for icelake. The existing
> > > models are preferred and retained.
> > >
> > > As well as the #system_tsc_freq being necessary, a test change is
> > > added here fixing an issue with fake PMU testing exposed in the
> > > new/updated metrics.
> > >
> > > Compared to the previous json, additional changes are the inclusion of
> > > basic meteorlake events and the renaming of tremontx to
> > > snowridgex. The new metrics contribute to the size, but a large
> > > contribution is the inclusion of previously ungenerated and
> > > experimental uncore events.
> > >
> >
> > Hi Ian,
> >
> > Thanks for the patchset.
> >
> > I would like to test this.
> >
> > What is the base for your work?
> > Mainline Git? perf-tools Git [0]?
> > Do you have an own Git repository (look like this is [1]) with all the
> > required/prerequisites and your patchset-v1 for easier fetching?
>
> Hi Sedat,
>
> I have bits of trees all over the place but nowhere I push my kernel
> work at the moment. To test the patches try the following:
>
> 1) Get a copy of Arnaldo's perf/core branch where active perf tool work happens:
>
> $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
> -b perf/core
> $ cd linux
>
> 2) Grab the #system_tsc_freq patches using b4:
>
> $ b4 am https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
> $ git am ./v4_20220718_irogers_add_arch_tsc_frequency_information.mbx
>
> 3) Grab the vendor update patches using b4:
>
> $ b4 am https://lore.kernel.org/lkml/20220722223240.1618013-1-irogers@google.com/
> $ git am ./20220722_irogers_add_generated_latest_intel_events_and_metrics.mbx
>
> Not sure why but this fails on a bunch of patches due to conflicts on
> mapfile.csv. This doesn't matter too much as long as we get the
> mapfile.csv to look like the following:
>
> Family-model,Version,Filename,EventType
> GenuineIntel-6-9[7A],v1.13,alderlake,core
> GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core
> GenuineIntel-6-(3D|47),v26,broadwell,core
> GenuineIntel-6-56,v23,broadwellde,core
> GenuineIntel-6-4F,v19,broadwellx,core
> GenuineIntel-6-55-[56789ABCDEF],v1.16,cascadelakex,core
> GenuineIntel-6-96,v1.03,elkhartlake,core
> GenuineIntel-6-5[CF],v13,goldmont,core
> GenuineIntel-6-7A,v1.01,goldmontplus,core
> GenuineIntel-6-(3C|45|46),v31,haswell,core
> GenuineIntel-6-3F,v25,haswellx,core
> GenuineIntel-6-(7D|7E|A7),v1.14,icelake,core
> GenuineIntel-6-6[AC],v1.15,icelakex,core
> GenuineIntel-6-3A,v22,ivybridge,core
> GenuineIntel-6-3E,v21,ivytown,core
> GenuineIntel-6-2D,v21,jaketown,core
> GenuineIntel-6-(57|85),v9,knightslanding,core
> GenuineIntel-6-AA,v1.00,meteorlake,core
> GenuineIntel-6-1[AEF],v3,nehalemep,core
> GenuineIntel-6-2E,v3,nehalemex,core
> GenuineIntel-6-2A,v17,sandybridge,core
> GenuineIntel-6-8F,v1.04,sapphirerapids,core
> GenuineIntel-6-(37|4C|4D),v14,silvermont,core
> GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
> GenuineIntel-6-55-[01234],v1.28,skylakex,core
> GenuineIntel-6-86,v1.20,snowridgex,core
> GenuineIntel-6-8[CD],v1.07,tigerlake,core
> GenuineIntel-6-2C,v2,westmereep-dp,core
> GenuineIntel-6-25,v3,westmereep-sp,core
> GenuineIntel-6-2F,v3,westmereex,core
> AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
> AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
> AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core
>
> When you see git say something like:
> error: patch failed: tools/perf/pmu-events/arch/x86/mapfile.csv:27
> error: tools/perf/pmu-events/arch/x86/mapfile.csv: patch does not apply
> edit tools/perf/pmu-events/arch/x86/mapfile.csv and add the 1 line
> change the patch has, which you can see in the diff with:
> $ git am --show-current-patch=diff
> then:
> $ git add tools/perf/pmu-events/arch/x86/mapfile.csv
> $ git am --continue
>
> I found also that the rename of
> tools/perf/pmu-events/arch/x86/tremontx to
> tools/perf/pmu-events/arch/x86/snowridgex didn't happen (you can mv
> the directory manually). Also that meteorlake files didn't get added,
> so you can just remove the meteorlake line from mapfile.csv.
>
> 4) build and test the perf command
>
> $ mkdir /tmp/perf
> $ make -C tools/perf O=/tmp/perf
> $ /tmp/perf/perf test
>
> Not sure why b4 isn't behaving well in step 3 but this should give you
> something to test with.
>
> Thanks,
> Ian
>
>
>
> > Regards,
> > -Sedat-
> >
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/
> > [1] https://github.com/captain5050
> >
> > > Ian Rogers (31):
> > >   perf test: Avoid sysfs state affecting fake events
> > >   perf vendor events: Update Intel broadwellx
> > >   perf vendor events: Update Intel broadwell
> > >   perf vendor events: Update Intel broadwellde
> > >   perf vendor events: Update Intel alderlake
> > >   perf vendor events: Update bonnell mapfile.csv
> > >   perf vendor events: Update Intel cascadelakex
> > >   perf vendor events: Update Intel elkhartlake
> > >   perf vendor events: Update goldmont mapfile.csv
> > >   perf vendor events: Update goldmontplus mapfile.csv
> > >   perf vendor events: Update Intel haswell
> > >   perf vendor events: Update Intel haswellx
> > >   perf vendor events: Update Intel icelake
> > >   perf vendor events: Update Intel icelakex
> > >   perf vendor events: Update Intel ivybridge
> > >   perf vendor events: Update Intel ivytown
> > >   perf vendor events: Update Intel jaketown
> > >   perf vendor events: Update Intel knightslanding
> > >   perf vendor events: Add Intel meteorlake
> > >   perf vendor events: Update Intel nehalemep
> > >   perf vendor events: Update Intel nehalemex
> > >   perf vendor events: Update Intel sandybridge
> > >   perf vendor events: Update Intel sapphirerapids
> > >   perf vendor events: Update Intel silvermont
> > >   perf vendor events: Update Intel skylake
> > >   perf vendor events: Update Intel skylakex
> > >   perf vendor events: Update Intel snowridgex
> > >   perf vendor events: Update Intel tigerlake
> > >   perf vendor events: Update Intel westmereep-dp
> > >   perf vendor events: Update Intel westmereep-sp
> > >   perf vendor events: Update Intel westmereex
> > >
> > >  .../arch/x86/alderlake/adl-metrics.json       |     4 +-
> > >  .../pmu-events/arch/x86/alderlake/cache.json  |   178 +-
> > >  .../arch/x86/alderlake/floating-point.json    |    19 +-
> > >  .../arch/x86/alderlake/frontend.json          |    38 +-
> > >  .../pmu-events/arch/x86/alderlake/memory.json |    40 +-
> > >  .../pmu-events/arch/x86/alderlake/other.json  |    97 +-
> > >  .../arch/x86/alderlake/pipeline.json          |   507 +-
> > >  .../arch/x86/alderlake/uncore-other.json      |     2 +-
> > >  .../arch/x86/alderlake/virtual-memory.json    |    63 +-
> > >  .../pmu-events/arch/x86/bonnell/cache.json    |     2 +-
> > >  .../arch/x86/bonnell/floating-point.json      |     2 +-
> > >  .../pmu-events/arch/x86/bonnell/frontend.json |     2 +-
> > >  .../pmu-events/arch/x86/bonnell/memory.json   |     2 +-
> > >  .../pmu-events/arch/x86/bonnell/other.json    |     2 +-
> > >  .../pmu-events/arch/x86/bonnell/pipeline.json |     2 +-
> > >  .../arch/x86/bonnell/virtual-memory.json      |     2 +-
> > >  .../arch/x86/broadwell/bdw-metrics.json       |   130 +-
> > >  .../pmu-events/arch/x86/broadwell/cache.json  |     2 +-
> > >  .../arch/x86/broadwell/floating-point.json    |     2 +-
> > >  .../arch/x86/broadwell/frontend.json          |     2 +-
> > >  .../pmu-events/arch/x86/broadwell/memory.json |     2 +-
> > >  .../pmu-events/arch/x86/broadwell/other.json  |     2 +-
> > >  .../arch/x86/broadwell/pipeline.json          |     2 +-
> > >  .../arch/x86/broadwell/uncore-cache.json      |   152 +
> > >  .../arch/x86/broadwell/uncore-other.json      |    82 +
> > >  .../pmu-events/arch/x86/broadwell/uncore.json |   278 -
> > >  .../arch/x86/broadwell/virtual-memory.json    |     2 +-
> > >  .../arch/x86/broadwellde/bdwde-metrics.json   |   136 +-
> > >  .../arch/x86/broadwellde/cache.json           |     2 +-
> > >  .../arch/x86/broadwellde/floating-point.json  |     2 +-
> > >  .../arch/x86/broadwellde/frontend.json        |     2 +-
> > >  .../arch/x86/broadwellde/memory.json          |     2 +-
> > >  .../arch/x86/broadwellde/other.json           |     2 +-
> > >  .../arch/x86/broadwellde/pipeline.json        |     2 +-
> > >  .../arch/x86/broadwellde/uncore-cache.json    |  3818 ++-
> > >  .../arch/x86/broadwellde/uncore-memory.json   |  2867 +-
> > >  .../arch/x86/broadwellde/uncore-other.json    |  1246 +
> > >  .../arch/x86/broadwellde/uncore-power.json    |   492 +-
> > >  .../arch/x86/broadwellde/virtual-memory.json  |     2 +-
> > >  .../arch/x86/broadwellx/bdx-metrics.json      |   570 +-
> > >  .../pmu-events/arch/x86/broadwellx/cache.json |    22 +-
> > >  .../arch/x86/broadwellx/floating-point.json   |     9 +-
> > >  .../arch/x86/broadwellx/frontend.json         |     2 +-
> > >  .../arch/x86/broadwellx/memory.json           |    39 +-
> > >  .../pmu-events/arch/x86/broadwellx/other.json |     2 +-
> > >  .../arch/x86/broadwellx/pipeline.json         |     4 +-
> > >  .../arch/x86/broadwellx/uncore-cache.json     |  3788 ++-
> > >  .../x86/broadwellx/uncore-interconnect.json   |  1438 +-
> > >  .../arch/x86/broadwellx/uncore-memory.json    |  2849 +-
> > >  .../arch/x86/broadwellx/uncore-other.json     |  3252 ++
> > >  .../arch/x86/broadwellx/uncore-power.json     |   437 +-
> > >  .../arch/x86/broadwellx/virtual-memory.json   |     2 +-
> > >  .../arch/x86/cascadelakex/cache.json          |     8 +-
> > >  .../arch/x86/cascadelakex/clx-metrics.json    |   724 +-
> > >  .../arch/x86/cascadelakex/floating-point.json |     2 +-
> > >  .../arch/x86/cascadelakex/frontend.json       |     2 +-
> > >  .../arch/x86/cascadelakex/other.json          |    63 +
> > >  .../arch/x86/cascadelakex/pipeline.json       |    11 +
> > >  .../arch/x86/cascadelakex/uncore-memory.json  |     9 +
> > >  .../arch/x86/cascadelakex/uncore-other.json   |   697 +-
> > >  .../arch/x86/cascadelakex/virtual-memory.json |     2 +-
> > >  .../arch/x86/elkhartlake/cache.json           |   956 +-
> > >  .../arch/x86/elkhartlake/floating-point.json  |    19 +-
> > >  .../arch/x86/elkhartlake/frontend.json        |    34 +-
> > >  .../arch/x86/elkhartlake/memory.json          |   388 +-
> > >  .../arch/x86/elkhartlake/other.json           |   527 +-
> > >  .../arch/x86/elkhartlake/pipeline.json        |   203 +-
> > >  .../arch/x86/elkhartlake/virtual-memory.json  |   151 +-
> > >  .../pmu-events/arch/x86/goldmont/cache.json   |     2 +-
> > >  .../arch/x86/goldmont/floating-point.json     |     2 +-
> > >  .../arch/x86/goldmont/frontend.json           |     2 +-
> > >  .../pmu-events/arch/x86/goldmont/memory.json  |     2 +-
> > >  .../arch/x86/goldmont/pipeline.json           |     2 +-
> > >  .../arch/x86/goldmont/virtual-memory.json     |     2 +-
> > >  .../arch/x86/goldmontplus/cache.json          |     2 +-
> > >  .../arch/x86/goldmontplus/floating-point.json |     2 +-
> > >  .../arch/x86/goldmontplus/frontend.json       |     2 +-
> > >  .../arch/x86/goldmontplus/memory.json         |     2 +-
> > >  .../arch/x86/goldmontplus/pipeline.json       |     2 +-
> > >  .../arch/x86/goldmontplus/virtual-memory.json |     2 +-
> > >  .../pmu-events/arch/x86/haswell/cache.json    |    78 +-
> > >  .../arch/x86/haswell/floating-point.json      |     2 +-
> > >  .../pmu-events/arch/x86/haswell/frontend.json |     2 +-
> > >  .../arch/x86/haswell/hsw-metrics.json         |    85 +-
> > >  .../pmu-events/arch/x86/haswell/memory.json   |    75 +-
> > >  .../pmu-events/arch/x86/haswell/other.json    |     2 +-
> > >  .../pmu-events/arch/x86/haswell/pipeline.json |     9 +-
> > >  .../arch/x86/haswell/uncore-other.json        |     7 +-
> > >  .../arch/x86/haswell/virtual-memory.json      |     2 +-
> > >  .../pmu-events/arch/x86/haswellx/cache.json   |    44 +-
> > >  .../arch/x86/haswellx/floating-point.json     |     2 +-
> > >  .../arch/x86/haswellx/frontend.json           |     2 +-
> > >  .../arch/x86/haswellx/hsx-metrics.json        |    85 +-
> > >  .../pmu-events/arch/x86/haswellx/memory.json  |    52 +-
> > >  .../pmu-events/arch/x86/haswellx/other.json   |     2 +-
> > >  .../arch/x86/haswellx/pipeline.json           |     9 +-
> > >  .../arch/x86/haswellx/uncore-cache.json       |  3779 ++-
> > >  .../x86/haswellx/uncore-interconnect.json     |  1430 +-
> > >  .../arch/x86/haswellx/uncore-memory.json      |  2839 +-
> > >  .../arch/x86/haswellx/uncore-other.json       |  3170 ++
> > >  .../arch/x86/haswellx/uncore-power.json       |   477 +-
> > >  .../arch/x86/haswellx/virtual-memory.json     |     2 +-
> > >  .../pmu-events/arch/x86/icelake/cache.json    |     8 +-
> > >  .../arch/x86/icelake/floating-point.json      |     2 +-
> > >  .../pmu-events/arch/x86/icelake/frontend.json |     2 +-
> > >  .../arch/x86/icelake/icl-metrics.json         |   126 +-
> > >  .../arch/x86/icelake/uncore-other.json        |    31 +
> > >  .../arch/x86/icelake/virtual-memory.json      |     2 +-
> > >  .../pmu-events/arch/x86/icelakex/cache.json   |    28 +-
> > >  .../arch/x86/icelakex/floating-point.json     |     2 +-
> > >  .../arch/x86/icelakex/frontend.json           |     2 +-
> > >  .../arch/x86/icelakex/icx-metrics.json        |   691 +-
> > >  .../pmu-events/arch/x86/icelakex/memory.json  |     6 +-
> > >  .../pmu-events/arch/x86/icelakex/other.json   |    51 +-
> > >  .../arch/x86/icelakex/pipeline.json           |    12 +
> > >  .../arch/x86/icelakex/virtual-memory.json     |     2 +-
> > >  .../pmu-events/arch/x86/ivybridge/cache.json  |     2 +-
> > >  .../arch/x86/ivybridge/floating-point.json    |     2 +-
> > >  .../arch/x86/ivybridge/frontend.json          |     2 +-
> > >  .../arch/x86/ivybridge/ivb-metrics.json       |    94 +-
> > >  .../pmu-events/arch/x86/ivybridge/memory.json |     2 +-
> > >  .../pmu-events/arch/x86/ivybridge/other.json  |     2 +-
> > >  .../arch/x86/ivybridge/pipeline.json          |     4 +-
> > >  .../arch/x86/ivybridge/uncore-other.json      |     2 +-
> > >  .../arch/x86/ivybridge/virtual-memory.json    |     2 +-
> > >  .../pmu-events/arch/x86/ivytown/cache.json    |     2 +-
> > >  .../arch/x86/ivytown/floating-point.json      |     2 +-
> > >  .../pmu-events/arch/x86/ivytown/frontend.json |     2 +-
> > >  .../arch/x86/ivytown/ivt-metrics.json         |    94 +-
> > >  .../pmu-events/arch/x86/ivytown/memory.json   |     2 +-
> > >  .../pmu-events/arch/x86/ivytown/other.json    |     2 +-
> > >  .../arch/x86/ivytown/uncore-cache.json        |  3495 ++-
> > >  .../arch/x86/ivytown/uncore-interconnect.json |  1750 +-
> > >  .../arch/x86/ivytown/uncore-memory.json       |  1775 +-
> > >  .../arch/x86/ivytown/uncore-other.json        |  2411 ++
> > >  .../arch/x86/ivytown/uncore-power.json        |   696 +-
> > >  .../arch/x86/ivytown/virtual-memory.json      |     2 +-
> > >  .../pmu-events/arch/x86/jaketown/cache.json   |     2 +-
> > >  .../arch/x86/jaketown/floating-point.json     |     2 +-
> > >  .../arch/x86/jaketown/frontend.json           |     2 +-
> > >  .../arch/x86/jaketown/jkt-metrics.json        |    11 +-
> > >  .../pmu-events/arch/x86/jaketown/memory.json  |     2 +-
> > >  .../pmu-events/arch/x86/jaketown/other.json   |     2 +-
> > >  .../arch/x86/jaketown/pipeline.json           |    16 +-
> > >  .../arch/x86/jaketown/uncore-cache.json       |  1960 +-
> > >  .../x86/jaketown/uncore-interconnect.json     |   824 +-
> > >  .../arch/x86/jaketown/uncore-memory.json      |   445 +-
> > >  .../arch/x86/jaketown/uncore-other.json       |  1551 +
> > >  .../arch/x86/jaketown/uncore-power.json       |   362 +-
> > >  .../arch/x86/jaketown/virtual-memory.json     |     2 +-
> > >  .../arch/x86/knightslanding/cache.json        |     2 +-
> > >  .../x86/knightslanding/floating-point.json    |     2 +-
> > >  .../arch/x86/knightslanding/frontend.json     |     2 +-
> > >  .../arch/x86/knightslanding/memory.json       |     2 +-
> > >  .../arch/x86/knightslanding/pipeline.json     |     2 +-
> > >  .../x86/knightslanding/uncore-memory.json     |    42 -
> > >  .../arch/x86/knightslanding/uncore-other.json |  3890 +++
> > >  .../x86/knightslanding/virtual-memory.json    |     2 +-
> > >  tools/perf/pmu-events/arch/x86/mapfile.csv    |    74 +-
> > >  .../pmu-events/arch/x86/meteorlake/cache.json |   262 +
> > >  .../arch/x86/meteorlake/frontend.json         |    24 +
> > >  .../arch/x86/meteorlake/memory.json           |   185 +
> > >  .../pmu-events/arch/x86/meteorlake/other.json |    46 +
> > >  .../arch/x86/meteorlake/pipeline.json         |   254 +
> > >  .../arch/x86/meteorlake/virtual-memory.json   |    46 +
> > >  .../pmu-events/arch/x86/nehalemep/cache.json  |    14 +-
> > >  .../arch/x86/nehalemep/floating-point.json    |     2 +-
> > >  .../arch/x86/nehalemep/frontend.json          |     2 +-
> > >  .../pmu-events/arch/x86/nehalemep/memory.json |     6 +-
> > >  .../arch/x86/nehalemep/virtual-memory.json    |     2 +-
> > >  .../pmu-events/arch/x86/nehalemex/cache.json  |  2974 +-
> > >  .../arch/x86/nehalemex/floating-point.json    |   182 +-
> > >  .../arch/x86/nehalemex/frontend.json          |    20 +-
> > >  .../pmu-events/arch/x86/nehalemex/memory.json |   672 +-
> > >  .../pmu-events/arch/x86/nehalemex/other.json  |   170 +-
> > >  .../arch/x86/nehalemex/pipeline.json          |   830 +-
> > >  .../arch/x86/nehalemex/virtual-memory.json    |    92 +-
> > >  .../arch/x86/sandybridge/cache.json           |     2 +-
> > >  .../arch/x86/sandybridge/floating-point.json  |     2 +-
> > >  .../arch/x86/sandybridge/frontend.json        |     4 +-
> > >  .../arch/x86/sandybridge/memory.json          |     2 +-
> > >  .../arch/x86/sandybridge/other.json           |     2 +-
> > >  .../arch/x86/sandybridge/pipeline.json        |    10 +-
> > >  .../arch/x86/sandybridge/snb-metrics.json     |    11 +-
> > >  .../arch/x86/sandybridge/uncore-other.json    |     2 +-
> > >  .../arch/x86/sandybridge/virtual-memory.json  |     2 +-
> > >  .../arch/x86/sapphirerapids/cache.json        |   135 +-
> > >  .../x86/sapphirerapids/floating-point.json    |     6 +
> > >  .../arch/x86/sapphirerapids/frontend.json     |    16 +
> > >  .../arch/x86/sapphirerapids/memory.json       |    23 +-
> > >  .../arch/x86/sapphirerapids/other.json        |    68 +-
> > >  .../arch/x86/sapphirerapids/pipeline.json     |    99 +-
> > >  .../arch/x86/sapphirerapids/spr-metrics.json  |   566 +-
> > >  .../arch/x86/sapphirerapids/uncore-other.json |     9 -
> > >  .../x86/sapphirerapids/virtual-memory.json    |    20 +
> > >  .../pmu-events/arch/x86/silvermont/cache.json |     2 +-
> > >  .../arch/x86/silvermont/floating-point.json   |     2 +-
> > >  .../arch/x86/silvermont/frontend.json         |     2 +-
> > >  .../arch/x86/silvermont/memory.json           |     2 +-
> > >  .../pmu-events/arch/x86/silvermont/other.json |     2 +-
> > >  .../arch/x86/silvermont/pipeline.json         |     2 +-
> > >  .../arch/x86/silvermont/virtual-memory.json   |     2 +-
> > >  .../arch/x86/skylake/floating-point.json      |     2 +-
> > >  .../pmu-events/arch/x86/skylake/frontend.json |     2 +-
> > >  .../pmu-events/arch/x86/skylake/other.json    |     2 +-
> > >  .../arch/x86/skylake/skl-metrics.json         |   178 +-
> > >  .../arch/x86/skylake/uncore-cache.json        |   142 +
> > >  .../arch/x86/skylake/uncore-other.json        |    79 +
> > >  .../pmu-events/arch/x86/skylake/uncore.json   |   254 -
> > >  .../arch/x86/skylake/virtual-memory.json      |     2 +-
> > >  .../arch/x86/skylakex/floating-point.json     |     2 +-
> > >  .../arch/x86/skylakex/frontend.json           |     2 +-
> > >  .../pmu-events/arch/x86/skylakex/other.json   |    66 +-
> > >  .../arch/x86/skylakex/pipeline.json           |    11 +
> > >  .../arch/x86/skylakex/skx-metrics.json        |   667 +-
> > >  .../arch/x86/skylakex/uncore-memory.json      |     9 +
> > >  .../arch/x86/skylakex/uncore-other.json       |   730 +-
> > >  .../arch/x86/skylakex/virtual-memory.json     |     2 +-
> > >  .../x86/{tremontx => snowridgex}/cache.json   |    60 +-
> > >  .../floating-point.json                       |     9 +-
> > >  .../{tremontx => snowridgex}/frontend.json    |    20 +-
> > >  .../x86/{tremontx => snowridgex}/memory.json  |     4 +-
> > >  .../x86/{tremontx => snowridgex}/other.json   |    18 +-
> > >  .../{tremontx => snowridgex}/pipeline.json    |    98 +-
> > >  .../arch/x86/snowridgex/uncore-memory.json    |   619 +
> > >  .../arch/x86/snowridgex/uncore-other.json     | 25249 ++++++++++++++++
> > >  .../arch/x86/snowridgex/uncore-power.json     |   235 +
> > >  .../virtual-memory.json                       |    69 +-
> > >  .../pmu-events/arch/x86/tigerlake/cache.json  |    48 +-
> > >  .../arch/x86/tigerlake/floating-point.json    |     2 +-
> > >  .../arch/x86/tigerlake/frontend.json          |     2 +-
> > >  .../pmu-events/arch/x86/tigerlake/memory.json |     2 +-
> > >  .../pmu-events/arch/x86/tigerlake/other.json  |     1 -
> > >  .../arch/x86/tigerlake/pipeline.json          |     4 +-
> > >  .../arch/x86/tigerlake/tgl-metrics.json       |   378 +-
> > >  .../arch/x86/tigerlake/uncore-other.json      |    65 +
> > >  .../arch/x86/tigerlake/virtual-memory.json    |     2 +-
> > >  .../arch/x86/tremontx/uncore-memory.json      |   245 -
> > >  .../arch/x86/tremontx/uncore-other.json       |  2395 --
> > >  .../arch/x86/tremontx/uncore-power.json       |    11 -
> > >  .../arch/x86/westmereep-dp/cache.json         |     2 +-
> > >  .../x86/westmereep-dp/floating-point.json     |     2 +-
> > >  .../arch/x86/westmereep-dp/frontend.json      |     2 +-
> > >  .../arch/x86/westmereep-dp/memory.json        |     2 +-
> > >  .../x86/westmereep-dp/virtual-memory.json     |     2 +-
> > >  .../x86/westmereep-sp/floating-point.json     |     2 +-
> > >  .../arch/x86/westmereep-sp/frontend.json      |     2 +-
> > >  .../x86/westmereep-sp/virtual-memory.json     |     2 +-
> > >  .../arch/x86/westmereex/floating-point.json   |     2 +-
> > >  .../arch/x86/westmereex/frontend.json         |     2 +-
> > >  .../arch/x86/westmereex/virtual-memory.json   |     2 +-
> > >  tools/perf/tests/pmu-events.c                 |     9 +
> > >  252 files changed, 89144 insertions(+), 8438 deletions(-)
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-cache.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-other.json
> > >  delete mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-other.json
> > >  delete mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/cache.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/memory.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
> > >  delete mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore.json
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/cache.json (95%)
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/floating-point.json (84%)
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/frontend.json (94%)
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/memory.json (99%)
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/other.json (98%)
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/pipeline.json (89%)
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-other.json
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/virtual-memory.json (91%)
> > >  create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
> > >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json
> > >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json
> > >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json
> > >
> > > --
> > > 2.37.1.359.gd136c6c3e2-goog
> > >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 00/31] Add generated latest Intel events and metrics
  2022-07-27  6:48     ` Sedat Dilek
@ 2022-07-27 22:30       ` Ian Rogers
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Rogers @ 2022-07-27 22:30 UTC (permalink / raw)
  To: sedat.dilek
  Cc: perry.taylor, caleb.biggers, kshipra.bopardikar, Kan Liang,
	Zhengjun Xing, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Maxime Coquelin, Alexandre Torgue,
	Andi Kleen, James Clark, John Garry, linux-kernel,
	linux-perf-users, Stephane Eranian

On Tue, Jul 26, 2022 at 11:49 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sun, Jul 24, 2022 at 9:08 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Sat, Jul 23, 2022 at 10:52 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Sat, Jul 23, 2022 at 12:32 AM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > The goal of this patch series is to align the json events for Intel
> > > > platforms with those generated by:
> > > > https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> > > > This script takes the latest event json and TMA metrics from:
> > > > https://download.01.org/perfmon/ and adds to these metrics, in
> > > > particular uncore ones, from: https://github.com/intel/perfmon-metrics
> > > > The cpu_operating_frequency metric assumes the presence of the
> > > > system_tsc_freq literal posted/reviewed in:
> > > > https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
> > > >
>
> Hi Ian,
>
> thanks for your feedback and sorry for the delay.
>
> In the meantime Arnaldo's Git tree/branches changed.
> The above required patchset was integrated.
> Some other perf-tools patches were included.
>
> You sent a v2 - unsure if this fits [1] and what your plans are.
>
> Regarding the Sandybridge updates - they seem to be marginal.
>
> We will have - hopefully - Linux v5.19 FINAL this Sunday.
> And LLVM project opened release/15.x branch and version 15.0.0-rc1
> will be released this week (hope so).
> My next testings will focus on these two events.
>
> Anyway, I am still interested in testing your work.
>
> Please let me know.
>
> Thanks.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core
> [2] https://github.com/llvm/llvm-project/commits/release/15.x

Thanks Sedat,

There is a v3 now:
https://lore.kernel.org/lkml/20220727220832.2865794-1-irogers@google.com/
that rebases on top of the update from Zhengjun Xing. Thanks for
testing 5.19, hopefully we'll get all of the latest events and metrics
landed for 5.20. I agree the updates to Sandybridge are small compared
to say SkylakeX.

Thanks,
Ian

> > > > Some fixes were needed to the script for generating the json and are
> > > > contained in this pull request:
> > > > https://github.com/intel/event-converter-for-linux-perf/pull/15
> > > >
> > > > The json files were first downloaded before being used to generate the
> > > > perf json files. This fixes non-ascii characters for (R) and (TM) in
> > > > the source json files. This can be reproduced with:
> > > > $ download_and_gen.py --hermetic-download --outdir data
> > > > $ download_and_gen.py --url=file://`pwd`/data/01 --metrics-url=file://`pwd`/data/github
> > > >
> > > > A minor correction is made in the generated json of:
> > > > tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> > > > changing "\\Inbound\\" to just "Inbound" to avoid compilation errors
> > > > caused by \I.
> > > >
> > > > The elkhartlake metrics file is basic and not generated by scripts. It
> > > > is retained here although it causes a difference from the generated
> > > > files.
> > > >
> > > > The mapfile.csv is the third and final difference from the generated
> > > > version due to a bug in 01.org's models for icelake. The existing
> > > > models are preferred and retained.
> > > >
> > > > As well as the #system_tsc_freq being necessary, a test change is
> > > > added here fixing an issue with fake PMU testing exposed in the
> > > > new/updated metrics.
> > > >
> > > > Compared to the previous json, additional changes are the inclusion of
> > > > basic meteorlake events and the renaming of tremontx to
> > > > snowridgex. The new metrics contribute to the size, but a large
> > > > contribution is the inclusion of previously ungenerated and
> > > > experimental uncore events.
> > > >
> > >
> > > Hi Ian,
> > >
> > > Thanks for the patchset.
> > >
> > > I would like to test this.
> > >
> > > What is the base for your work?
> > > Mainline Git? perf-tools Git [0]?
> > > Do you have an own Git repository (look like this is [1]) with all the
> > > required/prerequisites and your patchset-v1 for easier fetching?
> >
> > Hi Sedat,
> >
> > I have bits of trees all over the place but nowhere I push my kernel
> > work at the moment. To test the patches try the following:
> >
> > 1) Get a copy of Arnaldo's perf/core branch where active perf tool work happens:
> >
> > $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
> > -b perf/core
> > $ cd linux
> >
> > 2) Grab the #system_tsc_freq patches using b4:
> >
> > $ b4 am https://lore.kernel.org/lkml/20220718164312.3994191-1-irogers@google.com/
> > $ git am ./v4_20220718_irogers_add_arch_tsc_frequency_information.mbx
> >
> > 3) Grab the vendor update patches using b4:
> >
> > $ b4 am https://lore.kernel.org/lkml/20220722223240.1618013-1-irogers@google.com/
> > $ git am ./20220722_irogers_add_generated_latest_intel_events_and_metrics.mbx
> >
> > Not sure why but this fails on a bunch of patches due to conflicts on
> > mapfile.csv. This doesn't matter too much as long as we get the
> > mapfile.csv to look like the following:
> >
> > Family-model,Version,Filename,EventType
> > GenuineIntel-6-9[7A],v1.13,alderlake,core
> > GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core
> > GenuineIntel-6-(3D|47),v26,broadwell,core
> > GenuineIntel-6-56,v23,broadwellde,core
> > GenuineIntel-6-4F,v19,broadwellx,core
> > GenuineIntel-6-55-[56789ABCDEF],v1.16,cascadelakex,core
> > GenuineIntel-6-96,v1.03,elkhartlake,core
> > GenuineIntel-6-5[CF],v13,goldmont,core
> > GenuineIntel-6-7A,v1.01,goldmontplus,core
> > GenuineIntel-6-(3C|45|46),v31,haswell,core
> > GenuineIntel-6-3F,v25,haswellx,core
> > GenuineIntel-6-(7D|7E|A7),v1.14,icelake,core
> > GenuineIntel-6-6[AC],v1.15,icelakex,core
> > GenuineIntel-6-3A,v22,ivybridge,core
> > GenuineIntel-6-3E,v21,ivytown,core
> > GenuineIntel-6-2D,v21,jaketown,core
> > GenuineIntel-6-(57|85),v9,knightslanding,core
> > GenuineIntel-6-AA,v1.00,meteorlake,core
> > GenuineIntel-6-1[AEF],v3,nehalemep,core
> > GenuineIntel-6-2E,v3,nehalemex,core
> > GenuineIntel-6-2A,v17,sandybridge,core
> > GenuineIntel-6-8F,v1.04,sapphirerapids,core
> > GenuineIntel-6-(37|4C|4D),v14,silvermont,core
> > GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v53,skylake,core
> > GenuineIntel-6-55-[01234],v1.28,skylakex,core
> > GenuineIntel-6-86,v1.20,snowridgex,core
> > GenuineIntel-6-8[CD],v1.07,tigerlake,core
> > GenuineIntel-6-2C,v2,westmereep-dp,core
> > GenuineIntel-6-25,v3,westmereep-sp,core
> > GenuineIntel-6-2F,v3,westmereex,core
> > AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
> > AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
> > AuthenticAMD-25-[[:xdigit:]]+,v1,amdzen3,core
> >
> > When you see git say something like:
> > error: patch failed: tools/perf/pmu-events/arch/x86/mapfile.csv:27
> > error: tools/perf/pmu-events/arch/x86/mapfile.csv: patch does not apply
> > edit tools/perf/pmu-events/arch/x86/mapfile.csv and add the 1 line
> > change the patch has, which you can see in the diff with:
> > $ git am --show-current-patch=diff
> > then:
> > $ git add tools/perf/pmu-events/arch/x86/mapfile.csv
> > $ git am --continue
> >
> > I found also that the rename of
> > tools/perf/pmu-events/arch/x86/tremontx to
> > tools/perf/pmu-events/arch/x86/snowridgex didn't happen (you can mv
> > the directory manually). Also that meteorlake files didn't get added,
> > so you can just remove the meteorlake line from mapfile.csv.
> >
> > 4) build and test the perf command
> >
> > $ mkdir /tmp/perf
> > $ make -C tools/perf O=/tmp/perf
> > $ /tmp/perf/perf test
> >
> > Not sure why b4 isn't behaving well in step 3 but this should give you
> > something to test with.
> >
> > Thanks,
> > Ian
> >
> >
> >
> > > Regards,
> > > -Sedat-
> > >
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/
> > > [1] https://github.com/captain5050
> > >
> > > > Ian Rogers (31):
> > > >   perf test: Avoid sysfs state affecting fake events
> > > >   perf vendor events: Update Intel broadwellx
> > > >   perf vendor events: Update Intel broadwell
> > > >   perf vendor events: Update Intel broadwellde
> > > >   perf vendor events: Update Intel alderlake
> > > >   perf vendor events: Update bonnell mapfile.csv
> > > >   perf vendor events: Update Intel cascadelakex
> > > >   perf vendor events: Update Intel elkhartlake
> > > >   perf vendor events: Update goldmont mapfile.csv
> > > >   perf vendor events: Update goldmontplus mapfile.csv
> > > >   perf vendor events: Update Intel haswell
> > > >   perf vendor events: Update Intel haswellx
> > > >   perf vendor events: Update Intel icelake
> > > >   perf vendor events: Update Intel icelakex
> > > >   perf vendor events: Update Intel ivybridge
> > > >   perf vendor events: Update Intel ivytown
> > > >   perf vendor events: Update Intel jaketown
> > > >   perf vendor events: Update Intel knightslanding
> > > >   perf vendor events: Add Intel meteorlake
> > > >   perf vendor events: Update Intel nehalemep
> > > >   perf vendor events: Update Intel nehalemex
> > > >   perf vendor events: Update Intel sandybridge
> > > >   perf vendor events: Update Intel sapphirerapids
> > > >   perf vendor events: Update Intel silvermont
> > > >   perf vendor events: Update Intel skylake
> > > >   perf vendor events: Update Intel skylakex
> > > >   perf vendor events: Update Intel snowridgex
> > > >   perf vendor events: Update Intel tigerlake
> > > >   perf vendor events: Update Intel westmereep-dp
> > > >   perf vendor events: Update Intel westmereep-sp
> > > >   perf vendor events: Update Intel westmereex
> > > >
> > > >  .../arch/x86/alderlake/adl-metrics.json       |     4 +-
> > > >  .../pmu-events/arch/x86/alderlake/cache.json  |   178 +-
> > > >  .../arch/x86/alderlake/floating-point.json    |    19 +-
> > > >  .../arch/x86/alderlake/frontend.json          |    38 +-
> > > >  .../pmu-events/arch/x86/alderlake/memory.json |    40 +-
> > > >  .../pmu-events/arch/x86/alderlake/other.json  |    97 +-
> > > >  .../arch/x86/alderlake/pipeline.json          |   507 +-
> > > >  .../arch/x86/alderlake/uncore-other.json      |     2 +-
> > > >  .../arch/x86/alderlake/virtual-memory.json    |    63 +-
> > > >  .../pmu-events/arch/x86/bonnell/cache.json    |     2 +-
> > > >  .../arch/x86/bonnell/floating-point.json      |     2 +-
> > > >  .../pmu-events/arch/x86/bonnell/frontend.json |     2 +-
> > > >  .../pmu-events/arch/x86/bonnell/memory.json   |     2 +-
> > > >  .../pmu-events/arch/x86/bonnell/other.json    |     2 +-
> > > >  .../pmu-events/arch/x86/bonnell/pipeline.json |     2 +-
> > > >  .../arch/x86/bonnell/virtual-memory.json      |     2 +-
> > > >  .../arch/x86/broadwell/bdw-metrics.json       |   130 +-
> > > >  .../pmu-events/arch/x86/broadwell/cache.json  |     2 +-
> > > >  .../arch/x86/broadwell/floating-point.json    |     2 +-
> > > >  .../arch/x86/broadwell/frontend.json          |     2 +-
> > > >  .../pmu-events/arch/x86/broadwell/memory.json |     2 +-
> > > >  .../pmu-events/arch/x86/broadwell/other.json  |     2 +-
> > > >  .../arch/x86/broadwell/pipeline.json          |     2 +-
> > > >  .../arch/x86/broadwell/uncore-cache.json      |   152 +
> > > >  .../arch/x86/broadwell/uncore-other.json      |    82 +
> > > >  .../pmu-events/arch/x86/broadwell/uncore.json |   278 -
> > > >  .../arch/x86/broadwell/virtual-memory.json    |     2 +-
> > > >  .../arch/x86/broadwellde/bdwde-metrics.json   |   136 +-
> > > >  .../arch/x86/broadwellde/cache.json           |     2 +-
> > > >  .../arch/x86/broadwellde/floating-point.json  |     2 +-
> > > >  .../arch/x86/broadwellde/frontend.json        |     2 +-
> > > >  .../arch/x86/broadwellde/memory.json          |     2 +-
> > > >  .../arch/x86/broadwellde/other.json           |     2 +-
> > > >  .../arch/x86/broadwellde/pipeline.json        |     2 +-
> > > >  .../arch/x86/broadwellde/uncore-cache.json    |  3818 ++-
> > > >  .../arch/x86/broadwellde/uncore-memory.json   |  2867 +-
> > > >  .../arch/x86/broadwellde/uncore-other.json    |  1246 +
> > > >  .../arch/x86/broadwellde/uncore-power.json    |   492 +-
> > > >  .../arch/x86/broadwellde/virtual-memory.json  |     2 +-
> > > >  .../arch/x86/broadwellx/bdx-metrics.json      |   570 +-
> > > >  .../pmu-events/arch/x86/broadwellx/cache.json |    22 +-
> > > >  .../arch/x86/broadwellx/floating-point.json   |     9 +-
> > > >  .../arch/x86/broadwellx/frontend.json         |     2 +-
> > > >  .../arch/x86/broadwellx/memory.json           |    39 +-
> > > >  .../pmu-events/arch/x86/broadwellx/other.json |     2 +-
> > > >  .../arch/x86/broadwellx/pipeline.json         |     4 +-
> > > >  .../arch/x86/broadwellx/uncore-cache.json     |  3788 ++-
> > > >  .../x86/broadwellx/uncore-interconnect.json   |  1438 +-
> > > >  .../arch/x86/broadwellx/uncore-memory.json    |  2849 +-
> > > >  .../arch/x86/broadwellx/uncore-other.json     |  3252 ++
> > > >  .../arch/x86/broadwellx/uncore-power.json     |   437 +-
> > > >  .../arch/x86/broadwellx/virtual-memory.json   |     2 +-
> > > >  .../arch/x86/cascadelakex/cache.json          |     8 +-
> > > >  .../arch/x86/cascadelakex/clx-metrics.json    |   724 +-
> > > >  .../arch/x86/cascadelakex/floating-point.json |     2 +-
> > > >  .../arch/x86/cascadelakex/frontend.json       |     2 +-
> > > >  .../arch/x86/cascadelakex/other.json          |    63 +
> > > >  .../arch/x86/cascadelakex/pipeline.json       |    11 +
> > > >  .../arch/x86/cascadelakex/uncore-memory.json  |     9 +
> > > >  .../arch/x86/cascadelakex/uncore-other.json   |   697 +-
> > > >  .../arch/x86/cascadelakex/virtual-memory.json |     2 +-
> > > >  .../arch/x86/elkhartlake/cache.json           |   956 +-
> > > >  .../arch/x86/elkhartlake/floating-point.json  |    19 +-
> > > >  .../arch/x86/elkhartlake/frontend.json        |    34 +-
> > > >  .../arch/x86/elkhartlake/memory.json          |   388 +-
> > > >  .../arch/x86/elkhartlake/other.json           |   527 +-
> > > >  .../arch/x86/elkhartlake/pipeline.json        |   203 +-
> > > >  .../arch/x86/elkhartlake/virtual-memory.json  |   151 +-
> > > >  .../pmu-events/arch/x86/goldmont/cache.json   |     2 +-
> > > >  .../arch/x86/goldmont/floating-point.json     |     2 +-
> > > >  .../arch/x86/goldmont/frontend.json           |     2 +-
> > > >  .../pmu-events/arch/x86/goldmont/memory.json  |     2 +-
> > > >  .../arch/x86/goldmont/pipeline.json           |     2 +-
> > > >  .../arch/x86/goldmont/virtual-memory.json     |     2 +-
> > > >  .../arch/x86/goldmontplus/cache.json          |     2 +-
> > > >  .../arch/x86/goldmontplus/floating-point.json |     2 +-
> > > >  .../arch/x86/goldmontplus/frontend.json       |     2 +-
> > > >  .../arch/x86/goldmontplus/memory.json         |     2 +-
> > > >  .../arch/x86/goldmontplus/pipeline.json       |     2 +-
> > > >  .../arch/x86/goldmontplus/virtual-memory.json |     2 +-
> > > >  .../pmu-events/arch/x86/haswell/cache.json    |    78 +-
> > > >  .../arch/x86/haswell/floating-point.json      |     2 +-
> > > >  .../pmu-events/arch/x86/haswell/frontend.json |     2 +-
> > > >  .../arch/x86/haswell/hsw-metrics.json         |    85 +-
> > > >  .../pmu-events/arch/x86/haswell/memory.json   |    75 +-
> > > >  .../pmu-events/arch/x86/haswell/other.json    |     2 +-
> > > >  .../pmu-events/arch/x86/haswell/pipeline.json |     9 +-
> > > >  .../arch/x86/haswell/uncore-other.json        |     7 +-
> > > >  .../arch/x86/haswell/virtual-memory.json      |     2 +-
> > > >  .../pmu-events/arch/x86/haswellx/cache.json   |    44 +-
> > > >  .../arch/x86/haswellx/floating-point.json     |     2 +-
> > > >  .../arch/x86/haswellx/frontend.json           |     2 +-
> > > >  .../arch/x86/haswellx/hsx-metrics.json        |    85 +-
> > > >  .../pmu-events/arch/x86/haswellx/memory.json  |    52 +-
> > > >  .../pmu-events/arch/x86/haswellx/other.json   |     2 +-
> > > >  .../arch/x86/haswellx/pipeline.json           |     9 +-
> > > >  .../arch/x86/haswellx/uncore-cache.json       |  3779 ++-
> > > >  .../x86/haswellx/uncore-interconnect.json     |  1430 +-
> > > >  .../arch/x86/haswellx/uncore-memory.json      |  2839 +-
> > > >  .../arch/x86/haswellx/uncore-other.json       |  3170 ++
> > > >  .../arch/x86/haswellx/uncore-power.json       |   477 +-
> > > >  .../arch/x86/haswellx/virtual-memory.json     |     2 +-
> > > >  .../pmu-events/arch/x86/icelake/cache.json    |     8 +-
> > > >  .../arch/x86/icelake/floating-point.json      |     2 +-
> > > >  .../pmu-events/arch/x86/icelake/frontend.json |     2 +-
> > > >  .../arch/x86/icelake/icl-metrics.json         |   126 +-
> > > >  .../arch/x86/icelake/uncore-other.json        |    31 +
> > > >  .../arch/x86/icelake/virtual-memory.json      |     2 +-
> > > >  .../pmu-events/arch/x86/icelakex/cache.json   |    28 +-
> > > >  .../arch/x86/icelakex/floating-point.json     |     2 +-
> > > >  .../arch/x86/icelakex/frontend.json           |     2 +-
> > > >  .../arch/x86/icelakex/icx-metrics.json        |   691 +-
> > > >  .../pmu-events/arch/x86/icelakex/memory.json  |     6 +-
> > > >  .../pmu-events/arch/x86/icelakex/other.json   |    51 +-
> > > >  .../arch/x86/icelakex/pipeline.json           |    12 +
> > > >  .../arch/x86/icelakex/virtual-memory.json     |     2 +-
> > > >  .../pmu-events/arch/x86/ivybridge/cache.json  |     2 +-
> > > >  .../arch/x86/ivybridge/floating-point.json    |     2 +-
> > > >  .../arch/x86/ivybridge/frontend.json          |     2 +-
> > > >  .../arch/x86/ivybridge/ivb-metrics.json       |    94 +-
> > > >  .../pmu-events/arch/x86/ivybridge/memory.json |     2 +-
> > > >  .../pmu-events/arch/x86/ivybridge/other.json  |     2 +-
> > > >  .../arch/x86/ivybridge/pipeline.json          |     4 +-
> > > >  .../arch/x86/ivybridge/uncore-other.json      |     2 +-
> > > >  .../arch/x86/ivybridge/virtual-memory.json    |     2 +-
> > > >  .../pmu-events/arch/x86/ivytown/cache.json    |     2 +-
> > > >  .../arch/x86/ivytown/floating-point.json      |     2 +-
> > > >  .../pmu-events/arch/x86/ivytown/frontend.json |     2 +-
> > > >  .../arch/x86/ivytown/ivt-metrics.json         |    94 +-
> > > >  .../pmu-events/arch/x86/ivytown/memory.json   |     2 +-
> > > >  .../pmu-events/arch/x86/ivytown/other.json    |     2 +-
> > > >  .../arch/x86/ivytown/uncore-cache.json        |  3495 ++-
> > > >  .../arch/x86/ivytown/uncore-interconnect.json |  1750 +-
> > > >  .../arch/x86/ivytown/uncore-memory.json       |  1775 +-
> > > >  .../arch/x86/ivytown/uncore-other.json        |  2411 ++
> > > >  .../arch/x86/ivytown/uncore-power.json        |   696 +-
> > > >  .../arch/x86/ivytown/virtual-memory.json      |     2 +-
> > > >  .../pmu-events/arch/x86/jaketown/cache.json   |     2 +-
> > > >  .../arch/x86/jaketown/floating-point.json     |     2 +-
> > > >  .../arch/x86/jaketown/frontend.json           |     2 +-
> > > >  .../arch/x86/jaketown/jkt-metrics.json        |    11 +-
> > > >  .../pmu-events/arch/x86/jaketown/memory.json  |     2 +-
> > > >  .../pmu-events/arch/x86/jaketown/other.json   |     2 +-
> > > >  .../arch/x86/jaketown/pipeline.json           |    16 +-
> > > >  .../arch/x86/jaketown/uncore-cache.json       |  1960 +-
> > > >  .../x86/jaketown/uncore-interconnect.json     |   824 +-
> > > >  .../arch/x86/jaketown/uncore-memory.json      |   445 +-
> > > >  .../arch/x86/jaketown/uncore-other.json       |  1551 +
> > > >  .../arch/x86/jaketown/uncore-power.json       |   362 +-
> > > >  .../arch/x86/jaketown/virtual-memory.json     |     2 +-
> > > >  .../arch/x86/knightslanding/cache.json        |     2 +-
> > > >  .../x86/knightslanding/floating-point.json    |     2 +-
> > > >  .../arch/x86/knightslanding/frontend.json     |     2 +-
> > > >  .../arch/x86/knightslanding/memory.json       |     2 +-
> > > >  .../arch/x86/knightslanding/pipeline.json     |     2 +-
> > > >  .../x86/knightslanding/uncore-memory.json     |    42 -
> > > >  .../arch/x86/knightslanding/uncore-other.json |  3890 +++
> > > >  .../x86/knightslanding/virtual-memory.json    |     2 +-
> > > >  tools/perf/pmu-events/arch/x86/mapfile.csv    |    74 +-
> > > >  .../pmu-events/arch/x86/meteorlake/cache.json |   262 +
> > > >  .../arch/x86/meteorlake/frontend.json         |    24 +
> > > >  .../arch/x86/meteorlake/memory.json           |   185 +
> > > >  .../pmu-events/arch/x86/meteorlake/other.json |    46 +
> > > >  .../arch/x86/meteorlake/pipeline.json         |   254 +
> > > >  .../arch/x86/meteorlake/virtual-memory.json   |    46 +
> > > >  .../pmu-events/arch/x86/nehalemep/cache.json  |    14 +-
> > > >  .../arch/x86/nehalemep/floating-point.json    |     2 +-
> > > >  .../arch/x86/nehalemep/frontend.json          |     2 +-
> > > >  .../pmu-events/arch/x86/nehalemep/memory.json |     6 +-
> > > >  .../arch/x86/nehalemep/virtual-memory.json    |     2 +-
> > > >  .../pmu-events/arch/x86/nehalemex/cache.json  |  2974 +-
> > > >  .../arch/x86/nehalemex/floating-point.json    |   182 +-
> > > >  .../arch/x86/nehalemex/frontend.json          |    20 +-
> > > >  .../pmu-events/arch/x86/nehalemex/memory.json |   672 +-
> > > >  .../pmu-events/arch/x86/nehalemex/other.json  |   170 +-
> > > >  .../arch/x86/nehalemex/pipeline.json          |   830 +-
> > > >  .../arch/x86/nehalemex/virtual-memory.json    |    92 +-
> > > >  .../arch/x86/sandybridge/cache.json           |     2 +-
> > > >  .../arch/x86/sandybridge/floating-point.json  |     2 +-
> > > >  .../arch/x86/sandybridge/frontend.json        |     4 +-
> > > >  .../arch/x86/sandybridge/memory.json          |     2 +-
> > > >  .../arch/x86/sandybridge/other.json           |     2 +-
> > > >  .../arch/x86/sandybridge/pipeline.json        |    10 +-
> > > >  .../arch/x86/sandybridge/snb-metrics.json     |    11 +-
> > > >  .../arch/x86/sandybridge/uncore-other.json    |     2 +-
> > > >  .../arch/x86/sandybridge/virtual-memory.json  |     2 +-
> > > >  .../arch/x86/sapphirerapids/cache.json        |   135 +-
> > > >  .../x86/sapphirerapids/floating-point.json    |     6 +
> > > >  .../arch/x86/sapphirerapids/frontend.json     |    16 +
> > > >  .../arch/x86/sapphirerapids/memory.json       |    23 +-
> > > >  .../arch/x86/sapphirerapids/other.json        |    68 +-
> > > >  .../arch/x86/sapphirerapids/pipeline.json     |    99 +-
> > > >  .../arch/x86/sapphirerapids/spr-metrics.json  |   566 +-
> > > >  .../arch/x86/sapphirerapids/uncore-other.json |     9 -
> > > >  .../x86/sapphirerapids/virtual-memory.json    |    20 +
> > > >  .../pmu-events/arch/x86/silvermont/cache.json |     2 +-
> > > >  .../arch/x86/silvermont/floating-point.json   |     2 +-
> > > >  .../arch/x86/silvermont/frontend.json         |     2 +-
> > > >  .../arch/x86/silvermont/memory.json           |     2 +-
> > > >  .../pmu-events/arch/x86/silvermont/other.json |     2 +-
> > > >  .../arch/x86/silvermont/pipeline.json         |     2 +-
> > > >  .../arch/x86/silvermont/virtual-memory.json   |     2 +-
> > > >  .../arch/x86/skylake/floating-point.json      |     2 +-
> > > >  .../pmu-events/arch/x86/skylake/frontend.json |     2 +-
> > > >  .../pmu-events/arch/x86/skylake/other.json    |     2 +-
> > > >  .../arch/x86/skylake/skl-metrics.json         |   178 +-
> > > >  .../arch/x86/skylake/uncore-cache.json        |   142 +
> > > >  .../arch/x86/skylake/uncore-other.json        |    79 +
> > > >  .../pmu-events/arch/x86/skylake/uncore.json   |   254 -
> > > >  .../arch/x86/skylake/virtual-memory.json      |     2 +-
> > > >  .../arch/x86/skylakex/floating-point.json     |     2 +-
> > > >  .../arch/x86/skylakex/frontend.json           |     2 +-
> > > >  .../pmu-events/arch/x86/skylakex/other.json   |    66 +-
> > > >  .../arch/x86/skylakex/pipeline.json           |    11 +
> > > >  .../arch/x86/skylakex/skx-metrics.json        |   667 +-
> > > >  .../arch/x86/skylakex/uncore-memory.json      |     9 +
> > > >  .../arch/x86/skylakex/uncore-other.json       |   730 +-
> > > >  .../arch/x86/skylakex/virtual-memory.json     |     2 +-
> > > >  .../x86/{tremontx => snowridgex}/cache.json   |    60 +-
> > > >  .../floating-point.json                       |     9 +-
> > > >  .../{tremontx => snowridgex}/frontend.json    |    20 +-
> > > >  .../x86/{tremontx => snowridgex}/memory.json  |     4 +-
> > > >  .../x86/{tremontx => snowridgex}/other.json   |    18 +-
> > > >  .../{tremontx => snowridgex}/pipeline.json    |    98 +-
> > > >  .../arch/x86/snowridgex/uncore-memory.json    |   619 +
> > > >  .../arch/x86/snowridgex/uncore-other.json     | 25249 ++++++++++++++++
> > > >  .../arch/x86/snowridgex/uncore-power.json     |   235 +
> > > >  .../virtual-memory.json                       |    69 +-
> > > >  .../pmu-events/arch/x86/tigerlake/cache.json  |    48 +-
> > > >  .../arch/x86/tigerlake/floating-point.json    |     2 +-
> > > >  .../arch/x86/tigerlake/frontend.json          |     2 +-
> > > >  .../pmu-events/arch/x86/tigerlake/memory.json |     2 +-
> > > >  .../pmu-events/arch/x86/tigerlake/other.json  |     1 -
> > > >  .../arch/x86/tigerlake/pipeline.json          |     4 +-
> > > >  .../arch/x86/tigerlake/tgl-metrics.json       |   378 +-
> > > >  .../arch/x86/tigerlake/uncore-other.json      |    65 +
> > > >  .../arch/x86/tigerlake/virtual-memory.json    |     2 +-
> > > >  .../arch/x86/tremontx/uncore-memory.json      |   245 -
> > > >  .../arch/x86/tremontx/uncore-other.json       |  2395 --
> > > >  .../arch/x86/tremontx/uncore-power.json       |    11 -
> > > >  .../arch/x86/westmereep-dp/cache.json         |     2 +-
> > > >  .../x86/westmereep-dp/floating-point.json     |     2 +-
> > > >  .../arch/x86/westmereep-dp/frontend.json      |     2 +-
> > > >  .../arch/x86/westmereep-dp/memory.json        |     2 +-
> > > >  .../x86/westmereep-dp/virtual-memory.json     |     2 +-
> > > >  .../x86/westmereep-sp/floating-point.json     |     2 +-
> > > >  .../arch/x86/westmereep-sp/frontend.json      |     2 +-
> > > >  .../x86/westmereep-sp/virtual-memory.json     |     2 +-
> > > >  .../arch/x86/westmereex/floating-point.json   |     2 +-
> > > >  .../arch/x86/westmereex/frontend.json         |     2 +-
> > > >  .../arch/x86/westmereex/virtual-memory.json   |     2 +-
> > > >  tools/perf/tests/pmu-events.c                 |     9 +
> > > >  252 files changed, 89144 insertions(+), 8438 deletions(-)
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-cache.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-other.json
> > > >  delete mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-other.json
> > > >  delete mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/cache.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/memory.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-cache.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-other.json
> > > >  delete mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore.json
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/cache.json (95%)
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/floating-point.json (84%)
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/frontend.json (94%)
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/memory.json (99%)
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/other.json (98%)
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/pipeline.json (89%)
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-other.json
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json
> > > >  rename tools/perf/pmu-events/arch/x86/{tremontx => snowridgex}/virtual-memory.json (91%)
> > > >  create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
> > > >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json
> > > >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json
> > > >  delete mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json
> > > >
> > > > --
> > > > 2.37.1.359.gd136c6c3e2-goog
> > > >

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-07-27 22:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-22 22:32 [PATCH v1 00/31] Add generated latest Intel events and metrics Ian Rogers
2022-07-22 22:32 ` [PATCH v1 01/31] perf test: Avoid sysfs state affecting fake events Ian Rogers
2022-07-22 22:32 ` [PATCH v1 06/31] perf vendor events: Update bonnell mapfile.csv Ian Rogers
2022-07-22 22:32 ` [PATCH v1 09/31] perf vendor events: Update goldmont mapfile.csv Ian Rogers
2022-07-22 22:32 ` [PATCH v1 10/31] perf vendor events: Update goldmontplus mapfile.csv Ian Rogers
2022-07-22 22:32 ` [PATCH v1 11/31] perf vendor events: Update Intel haswell Ian Rogers
2022-07-22 22:32 ` [PATCH v1 13/31] perf vendor events: Update Intel icelake Ian Rogers
2022-07-22 22:32 ` [PATCH v1 14/31] perf vendor events: Update Intel icelakex Ian Rogers
2022-07-22 22:32 ` [PATCH v1 15/31] perf vendor events: Update Intel ivybridge Ian Rogers
2022-07-22 22:32 ` [PATCH v1 19/31] perf vendor events: Add Intel meteorlake Ian Rogers
2022-07-22 22:32 ` [PATCH v1 20/31] perf vendor events: Update Intel nehalemep Ian Rogers
2022-07-22 22:32 ` [PATCH v1 22/31] perf vendor events: Update Intel sandybridge Ian Rogers
2022-07-22 22:32 ` [PATCH v1 24/31] perf vendor events: Update Intel silvermont Ian Rogers
2022-07-22 22:32 ` [PATCH v1 25/31] perf vendor events: Update Intel skylake Ian Rogers
2022-07-22 22:32 ` [PATCH v1 28/31] perf vendor events: Update Intel tigerlake Ian Rogers
2022-07-22 22:32 ` [PATCH v1 29/31] perf vendor events: Update Intel westmereep-dp Ian Rogers
2022-07-22 22:32 ` [PATCH v1 30/31] perf vendor events: Update Intel westmereep-sp Ian Rogers
2022-07-22 22:32 ` [PATCH v1 31/31] perf vendor events: Update Intel westmereex Ian Rogers
2022-07-24  5:51 ` [PATCH v1 00/31] Add generated latest Intel events and metrics Sedat Dilek
2022-07-24 19:08   ` Ian Rogers
2022-07-27  6:48     ` Sedat Dilek
2022-07-27 22:30       ` Ian Rogers
     [not found] ` <20220722223240.1618013-3-irogers@google.com>
     [not found]   ` <2c29ab7e-5fc5-5458-926c-11430e7c3c3b@linux.intel.com>
     [not found]     ` <CAP-5=fV65fiadnaAmebYS1CjxwuFy4oKxV88v6oHdVPCc=n+Ow@mail.gmail.com>
2022-07-26  1:25       ` [PATCH v1 02/31] perf vendor events: Update Intel broadwellx Xing Zhengjun
2022-07-26  4:49         ` Ian Rogers
2022-07-26  5:19           ` Xing Zhengjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).