linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo
@ 2023-07-10 14:18 James Clark
  2023-07-10 14:18 ` [PATCH v2 1/5] perf: cs-etm: Don't duplicate FIELD_GET() James Clark
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: James Clark @ 2023-07-10 14:18 UTC (permalink / raw)
  To: linux-perf-users, irogers, renyu.zj, john.g.garry
  Cc: namhyung, acme, James Clark

Changes since v1:
  * Split last change into two so it doesn't hit the mailing list size
    limit

James Clark (5):
  perf: cs-etm: Don't duplicate FIELD_GET()
  perf jevents: Match on highest version of Arm json file available
  perf vendor events arm64: Update scale units and descriptions of
    common topdown metrics
  perf vendor events arm64: Update N2-r0p3 and V2 metrics and events
    using Arm telemetry repo
  perf vendor events arm64: Update N2-r0p0 metrics and events using Arm
    telemetry repo

 tools/perf/arch/arm64/util/header.c           |  61 +++-
 .../arch/arm64/arm/neoverse-n2-v2/branch.json |   8 -
 .../arch/arm64/arm/neoverse-n2-v2/bus.json    |  20 --
 .../arch/arm64/arm/neoverse-n2-v2/cache.json  | 155 --------
 .../arm64/arm/neoverse-n2-v2/exception.json   |  47 ---
 .../arm64/arm/neoverse-n2-v2/instruction.json | 143 --------
 .../arch/arm64/arm/neoverse-n2-v2/memory.json |  41 ---
 .../arm64/arm/neoverse-n2-v2/metrics.json     | 273 --------------
 .../arm64/arm/neoverse-n2-v2/pipeline.json    |  23 --
 .../arch/arm64/arm/neoverse-n2-v2/spe.json    |  14 -
 .../arch/arm64/arm/neoverse-n2-v2/trace.json  |  29 --
 .../arch/arm64/arm/neoverse-n2r0p0/bus.json   |  18 +
 .../arm64/arm/neoverse-n2r0p0/exception.json  |  62 ++++
 .../arm/neoverse-n2r0p0/fp_operation.json     |  22 ++
 .../arm64/arm/neoverse-n2r0p0/general.json    |  10 +
 .../arm64/arm/neoverse-n2r0p0/l1d_cache.json  |  54 +++
 .../arm64/arm/neoverse-n2r0p0/l1i_cache.json  |  14 +
 .../arm64/arm/neoverse-n2r0p0/l2_cache.json   |  50 +++
 .../arm64/arm/neoverse-n2r0p0/l3_cache.json   |  22 ++
 .../arm64/arm/neoverse-n2r0p0/ll_cache.json   |  10 +
 .../arm64/arm/neoverse-n2r0p0/memory.json     |  46 +++
 .../arm64/arm/neoverse-n2r0p0/metrics.json    | 332 ++++++++++++++++++
 .../arm64/arm/neoverse-n2r0p0/retired.json    |  30 ++
 .../arch/arm64/arm/neoverse-n2r0p0/spe.json   |  18 +
 .../arm/neoverse-n2r0p0/spec_operation.json   | 110 ++++++
 .../arch/arm64/arm/neoverse-n2r0p0/stall.json |  30 ++
 .../arch/arm64/arm/neoverse-n2r0p0/sve.json   |  50 +++
 .../arch/arm64/arm/neoverse-n2r0p0/tlb.json   |  66 ++++
 .../arch/arm64/arm/neoverse-n2r0p0/trace.json |  38 ++
 .../arm64/arm/neoverse-n2r0p3-v2/bus.json     |  18 +
 .../arm/neoverse-n2r0p3-v2/exception.json     |  62 ++++
 .../arm/neoverse-n2r0p3-v2/fp_operation.json  |  22 ++
 .../arm64/arm/neoverse-n2r0p3-v2/general.json |  10 +
 .../arm/neoverse-n2r0p3-v2/l1d_cache.json     |  54 +++
 .../arm/neoverse-n2r0p3-v2/l1i_cache.json     |  14 +
 .../arm/neoverse-n2r0p3-v2/l2_cache.json      |  50 +++
 .../arm/neoverse-n2r0p3-v2/l3_cache.json      |  22 ++
 .../arm/neoverse-n2r0p3-v2/ll_cache.json      |  10 +
 .../arm64/arm/neoverse-n2r0p3-v2/memory.json  |  46 +++
 .../arm64/arm/neoverse-n2r0p3-v2/metrics.json | 331 +++++++++++++++++
 .../arm64/arm/neoverse-n2r0p3-v2/retired.json |  30 ++
 .../arm64/arm/neoverse-n2r0p3-v2/spe.json     |  18 +
 .../neoverse-n2r0p3-v2/spec_operation.json    | 110 ++++++
 .../arm64/arm/neoverse-n2r0p3-v2/stall.json   |  30 ++
 .../arm64/arm/neoverse-n2r0p3-v2/sve.json     |  50 +++
 .../arm64/arm/neoverse-n2r0p3-v2/tlb.json     |  66 ++++
 .../arm64/arm/neoverse-n2r0p3-v2/trace.json   |  38 ++
 tools/perf/pmu-events/arch/arm64/mapfile.csv  |   5 +-
 tools/perf/pmu-events/arch/arm64/sbsa.json    |  24 +-
 tools/perf/pmu-events/jevents.py              |  49 +--
 tools/perf/tests/pmu-events.c                 |  34 ++
 tools/perf/util/cs-etm.c                      |  14 +-
 52 files changed, 2088 insertions(+), 815 deletions(-)
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/bus.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/exception.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/memory.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spe.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/trace.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/bus.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/exception.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/fp_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/general.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1d_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1i_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l2_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l3_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/ll_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/memory.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/metrics.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/retired.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spe.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spec_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/stall.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/sve.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/tlb.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/trace.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/bus.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/exception.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/fp_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/general.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1d_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1i_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l2_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l3_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/ll_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/memory.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/retired.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spe.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spec_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/stall.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/sve.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/tlb.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/trace.json

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] perf: cs-etm: Don't duplicate FIELD_GET()
  2023-07-10 14:18 [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo James Clark
@ 2023-07-10 14:18 ` James Clark
  2023-07-10 14:19 ` [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available James Clark
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: James Clark @ 2023-07-10 14:18 UTC (permalink / raw)
  To: linux-perf-users, irogers, renyu.zj, john.g.garry
  Cc: namhyung, acme, James Clark

linux/bitfield.h can be included as long as linux/kernel.h is included
first, so change the order of the includes and drop the duplicate macro.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 1419b40dfbe8..9729d006550d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -6,10 +6,11 @@
  * Author: Mathieu Poirier <mathieu.poirier@linaro.org>
  */
 
+#include <linux/kernel.h>
+#include <linux/bitfield.h>
 #include <linux/bitops.h>
 #include <linux/coresight-pmu.h>
 #include <linux/err.h>
-#include <linux/kernel.h>
 #include <linux/log2.h>
 #include <linux/types.h>
 #include <linux/zalloc.h>
@@ -281,17 +282,6 @@ static int cs_etm__metadata_set_trace_id(u8 trace_chan_id, u64 *cpu_metadata)
 	return 0;
 }
 
-/*
- * FIELD_GET (linux/bitfield.h) not available outside kernel code,
- * and the header contains too many dependencies to just copy over,
- * so roll our own based on the original
- */
-#define __bf_shf(x) (__builtin_ffsll(x) - 1)
-#define FIELD_GET(_mask, _reg)						\
-	({								\
-		(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)); \
-	})
-
 /*
  * Get a metadata for a specific cpu from an array.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-10 14:18 [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo James Clark
  2023-07-10 14:18 ` [PATCH v2 1/5] perf: cs-etm: Don't duplicate FIELD_GET() James Clark
@ 2023-07-10 14:19 ` James Clark
  2023-07-10 16:56   ` John Garry
  2023-07-10 14:19 ` [PATCH v2 3/5] perf vendor events arm64: Update scale units and descriptions of common topdown metrics James Clark
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-10 14:19 UTC (permalink / raw)
  To: linux-perf-users, irogers, renyu.zj, john.g.garry
  Cc: namhyung, acme, James Clark

Currently version and revision fields are masked out of the MIDR so
there can only be one set of jsons per CPU. In a later commit multiple
revisions of Neoverse N2 json files will be provided.

The highest valid version of json files should be used, but to make this
work the mapfile has to be reverse sorted on the CPUID field so that the
highest is found first. It's possible, but error prone, to do this
manually so instead add an explicit sort into jevents.py. If the CPUID
is a string then the rows are string sorted rather than numerically.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/arch/arm64/util/header.c | 61 ++++++++++++++++++++++-------
 tools/perf/pmu-events/jevents.py    | 49 ++++++++++++-----------
 tools/perf/tests/pmu-events.c       | 34 ++++++++++++++++
 3 files changed, 108 insertions(+), 36 deletions(-)

diff --git a/tools/perf/arch/arm64/util/header.c b/tools/perf/arch/arm64/util/header.c
index 80b9f6287fe2..637ad21721c2 100644
--- a/tools/perf/arch/arm64/util/header.c
+++ b/tools/perf/arch/arm64/util/header.c
@@ -1,3 +1,6 @@
+#include <linux/kernel.h>
+#include <linux/bits.h>
+#include <linux/bitfield.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <perf/cpumap.h>
@@ -10,14 +13,12 @@
 
 #define MIDR "/regs/identification/midr_el1"
 #define MIDR_SIZE 19
-#define MIDR_REVISION_MASK      0xf
-#define MIDR_VARIANT_SHIFT      20
-#define MIDR_VARIANT_MASK       (0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_REVISION_MASK      GENMASK(3, 0)
+#define MIDR_VARIANT_MASK	GENMASK(23, 20)
 
 static int _get_cpuid(char *buf, size_t sz, struct perf_cpu_map *cpus)
 {
 	const char *sysfs = sysfs__mountpoint();
-	u64 midr = 0;
 	int cpu;
 
 	if (!sysfs || sz < MIDR_SIZE)
@@ -44,21 +45,11 @@ static int _get_cpuid(char *buf, size_t sz, struct perf_cpu_map *cpus)
 		}
 		fclose(file);
 
-		/* Ignore/clear Variant[23:20] and
-		 * Revision[3:0] of MIDR
-		 */
-		midr = strtoul(buf, NULL, 16);
-		midr &= (~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK));
-		scnprintf(buf, MIDR_SIZE, "0x%016lx", midr);
 		/* got midr break loop */
 		break;
 	}
 
 	perf_cpu_map__put(cpus);
-
-	if (!midr)
-		return EINVAL;
-
 	return 0;
 }
 
@@ -99,3 +90,45 @@ char *get_cpuid_str(struct perf_pmu *pmu)
 
 	return buf;
 }
+
+
+int strcmp_cpuid_str(const char *mapcpuid, const char *idstr)
+{
+	u64 map_id = strtoull(mapcpuid, NULL, 16);
+	char map_id_variant = FIELD_GET(MIDR_VARIANT_MASK, map_id);
+	char map_id_revision = FIELD_GET(MIDR_REVISION_MASK, map_id);
+	u64 id = strtoull(idstr, NULL, 16);
+	char id_variant = FIELD_GET(MIDR_VARIANT_MASK, id);
+	char id_revision = FIELD_GET(MIDR_REVISION_MASK, id);
+	u64 id_fields = ~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK);
+
+	/* Compare without version first */
+	if ((map_id & id_fields) != (id & id_fields))
+		return 1;
+
+	/*
+	 * ID matches, now compare version.
+	 *
+	 * Arm revisions (like r0p0) are compared here like two digit semver
+	 * values eg. 1.3 < 2.0 < 2.1 < 2.2. The events json file with the
+	 * highest matching version is used.
+	 *
+	 *  r = high value = 'Variant' field in MIDR
+	 *  p = low value  = 'Revision' field in MIDR
+	 *
+	 * Because the Variant field is further to the left, iterating through a
+	 * reverse sorted mapfile.csv gives the correct comparison behavior.
+	 * This relies on jevents.py sorting the list in print_mapping_table().
+	 */
+	if (id_variant > map_id_variant)
+		return 0;
+
+	if (id_variant == map_id_variant && id_revision >= map_id_revision)
+		return 0;
+
+	/*
+	 * variant is less than mapfile variant or variants are the same but
+	 * the revision doesn't match. Return no match.
+	 */
+	return 1;
+}
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 12e80bb7939b..c6a848f8d93a 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -620,28 +620,34 @@ const struct pmu_events_map pmu_events_map[] = {
 },
 """)
     else:
+      def int_or_string_key(row):
+        try:
+          return int(row[0], 0)
+        except:
+          return row[0]
       with open(f'{_args.starting_dir}/{arch}/mapfile.csv') as csvfile:
-        table = csv.reader(csvfile)
-        first = True
-        for row in table:
-          # Skip the first row or any row beginning with #.
-          if not first and len(row) > 0 and not row[0].startswith('#'):
-            event_tblname = file_name_to_table_name('pmu_events_', [], row[2].replace('/', '_'))
-            if event_tblname in _event_tables:
-              event_size = f'ARRAY_SIZE({event_tblname})'
-            else:
-              event_tblname = 'NULL'
-              event_size = '0'
-            metric_tblname = file_name_to_table_name('pmu_metrics_', [], row[2].replace('/', '_'))
-            if metric_tblname in _metric_tables:
-              metric_size = f'ARRAY_SIZE({metric_tblname})'
-            else:
-              metric_tblname = 'NULL'
-              metric_size = '0'
-            if event_size == '0' and metric_size == '0':
-              continue
-            cpuid = row[0].replace('\\', '\\\\')
-            _args.output_file.write(f"""{{
+        table = [row for row in csv.reader(csvfile)]
+      # Strip the first row or any row beginning with #.
+      table = [row for row in table[1:] if len(row) > 0 and not row[0].startswith('#')]
+      # Sort on CPUID field for predictable >= version comparisons later on
+      table = sorted(table, key=int_or_string_key, reverse=True)
+      for row in table:
+        event_tblname = file_name_to_table_name('pmu_events_', [], row[2].replace('/', '_'))
+        if event_tblname in _event_tables:
+          event_size = f'ARRAY_SIZE({event_tblname})'
+        else:
+          event_tblname = 'NULL'
+          event_size = '0'
+        metric_tblname = file_name_to_table_name('pmu_metrics_', [], row[2].replace('/', '_'))
+        if metric_tblname in _metric_tables:
+          metric_size = f'ARRAY_SIZE({metric_tblname})'
+        else:
+          metric_tblname = 'NULL'
+          metric_size = '0'
+        if event_size == '0' and metric_size == '0':
+          continue
+        cpuid = row[0].replace('\\', '\\\\')
+        _args.output_file.write(f"""{{
 \t.arch = "{arch}",
 \t.cpuid = "{cpuid}",
 \t.event_table = {{
@@ -654,7 +660,6 @@ const struct pmu_events_map pmu_events_map[] = {
 \t}}
 }},
 """)
-          first = False
 
   _args.output_file.write("""{
 \t.arch = 0,
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 64383fc34ef1..e730d4792bbe 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -14,6 +14,7 @@
 #include "util/evlist.h"
 #include "util/expr.h"
 #include "util/hashmap.h"
+#include "util/header.h"
 #include "util/parse-events.h"
 #include "metricgroup.h"
 #include "stat.h"
@@ -1027,6 +1028,38 @@ static int test__parsing_threshold(struct test_suite *test __maybe_unused,
 	return pmu_for_each_sys_metric(test__parsing_threshold_callback, NULL);
 }
 
+static int test__cpuid_match(struct test_suite *test __maybe_unused,
+				       int subtest __maybe_unused)
+{
+#ifdef __aarch64__
+	/* midr with no leading zeros matches */
+	if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410fd0c0"))
+		return -1;
+	/* Upper case matches */
+	if (strcmp_cpuid_str("0x410fd0c0", "0x00000000410FD0C0"))
+		return -1;
+	/* r0p0 = r0p0 matches */
+	if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd480"))
+		return -1;
+	/* r0p1 > r0p0 matches */
+	if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000410fd481"))
+		return -1;
+	/* r1p0 > r0p0 matches*/
+	if (strcmp_cpuid_str("0x00000000410fd480", "0x00000000411fd480"))
+		return -1;
+	/* r0p0 < r0p1 doesn't match */
+	if (!strcmp_cpuid_str("0x00000000410fd481", "0x00000000410fd480"))
+		return -1;
+	/* r0p0 < r1p0 doesn't match */
+	if (!strcmp_cpuid_str("0x00000000411fd480", "0x00000000410fd480"))
+		return -1;
+	/* Different CPU doesn't match */
+	if (!strcmp_cpuid_str("0x00000000410fd4c0", "0x00000000430f0af0"))
+		return -1;
+#endif
+	return 0;
+}
+
 static struct test_case pmu_events_tests[] = {
 	TEST_CASE("PMU event table sanity", pmu_event_table),
 	TEST_CASE("PMU event map aliases", aliases),
@@ -1034,6 +1067,7 @@ static struct test_case pmu_events_tests[] = {
 			 "some metrics failed"),
 	TEST_CASE("Parsing of PMU event table metrics with fake PMUs", parsing_fake),
 	TEST_CASE("Parsing of metric thresholds with fake PMUs", parsing_threshold),
+	TEST_CASE("CPUID matching", cpuid_match),
 	{ .name = NULL, }
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] perf vendor events arm64: Update scale units and descriptions of common topdown metrics
  2023-07-10 14:18 [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo James Clark
  2023-07-10 14:18 ` [PATCH v2 1/5] perf: cs-etm: Don't duplicate FIELD_GET() James Clark
  2023-07-10 14:19 ` [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available James Clark
@ 2023-07-10 14:19 ` James Clark
  2023-07-10 14:19 ` [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo James Clark
  2023-07-10 14:19 ` [PATCH v2 5/5] perf vendor events arm64: Update N2-r0p0 " James Clark
  4 siblings, 0 replies; 17+ messages in thread
From: James Clark @ 2023-07-10 14:19 UTC (permalink / raw)
  To: linux-perf-users, irogers, renyu.zj, john.g.garry
  Cc: namhyung, acme, James Clark

Metrics will be published here [1] going forwards, but they have
slightly different scale units. To allow autogenerated metrics to be
added more easily, update the scale units to match.

The more detailed descriptions have also been taken and added to the
common file.

[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/tree/main/data/pmu/cpu/
Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/pmu-events/arch/arm64/sbsa.json | 24 +++++++++++-----------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
index f90b338261ac..4eed79a28f6e 100644
--- a/tools/perf/pmu-events/arch/arm64/sbsa.json
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -1,34 +1,34 @@
 [
     {
-        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
-        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+        "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
         "DefaultMetricgroupName": "TopdownL1",
         "MetricGroup": "Default;TopdownL1",
         "MetricName": "frontend_bound",
-        "ScaleUnit": "100%"
+        "ScaleUnit": "1percent of slots"
     },
     {
-        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
-        "BriefDescription": "Bad speculation L1 topdown metric",
+        "MetricExpr": "100 * ((1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles)))",
+        "BriefDescription": "This metric is the percentage of total slots that executed operations and didn't retire due to a pipeline flush.\nThis indicates cycles that were utilized but inefficiently.",
         "DefaultMetricgroupName": "TopdownL1",
         "MetricGroup": "Default;TopdownL1",
         "MetricName": "bad_speculation",
-        "ScaleUnit": "100%"
+        "ScaleUnit": "1percent of slots"
     },
     {
-        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
-        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricExpr": "100 * ((op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles)))",
+        "BriefDescription": "This metric is the percentage of total slots that retired operations, which indicates cycles that were utilized efficiently.",
         "DefaultMetricgroupName": "TopdownL1",
         "MetricGroup": "Default;TopdownL1",
         "MetricName": "retiring",
-        "ScaleUnit": "100%"
+        "ScaleUnit": "1percent of slots"
     },
     {
-        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
-        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricExpr": "100 * (stall_slot_backend / (#slots * cpu_cycles))",
+        "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the backend of the processor.",
         "DefaultMetricgroupName": "TopdownL1",
         "MetricGroup": "Default;TopdownL1",
         "MetricName": "backend_bound",
-        "ScaleUnit": "100%"
+        "ScaleUnit": "1percent of slots"
     }
 ]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo
  2023-07-10 14:18 [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo James Clark
                   ` (2 preceding siblings ...)
  2023-07-10 14:19 ` [PATCH v2 3/5] perf vendor events arm64: Update scale units and descriptions of common topdown metrics James Clark
@ 2023-07-10 14:19 ` James Clark
  2023-07-10 14:28   ` James Clark
  2023-07-10 14:19 ` [PATCH v2 5/5] perf vendor events arm64: Update N2-r0p0 " James Clark
  4 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-10 14:19 UTC (permalink / raw)
  To: linux-perf-users, irogers, renyu.zj, john.g.garry
  Cc: namhyung, acme, James Clark

The new metrics contain a fix for N2 r0p3 where CPU_CYCLES should not be
subtracted from stalls for topdown metrics anymore. The current metrics
assume that the fix should be applied anywhere where slots != 5, but
this is only the case for V2 and not N2 r0p3.

Split the metrics into a new version for N2-r0p3 and V2 which still
share the same metrics. Apart from some slight naming and grouping
differences the new metrics are functionally the same as the existing
ones. Any missing metrics were manually appended to the end of the auto
generated file.

For the events, the new data includes descriptions that may have product
specific details and new groupings that will be consistent with other
products.

After generating the metrics from the telemetry repo [1], the following
manual steps were performed:

 * Change the hard coded slots in neoverse-n2r0p3-v2 to #slots so that
   it will work on both N2 and V2.

 * Append some metrics from the old N2/V2 data that aren't present in
   the telemetry data. These will possibly be added to the
   telemetry-solution repo at a later time:

    l3d_cache_mpki, l3d_cache_miss_rate, branch_pki, ipc_rate, spec_ipc,
    retired_rate, wasted_rate, load_spec_rate, store_spec_rate,
    advanced_simd_spec_rate, float_point_spec_rate,
    branch_immed_spec_rate, branch_return_spec_rate,
    branch_indirect_spec_rate

[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2.json

Signed-off-by: James Clark <james.clark@arm.com>
---
 .../arm64/arm/neoverse-n2r0p3-v2/bus.json     |  18 +
 .../arm/neoverse-n2r0p3-v2/exception.json     |  62 ++++
 .../arm/neoverse-n2r0p3-v2/fp_operation.json  |  22 ++
 .../arm64/arm/neoverse-n2r0p3-v2/general.json |  10 +
 .../arm/neoverse-n2r0p3-v2/l1d_cache.json     |  54 +++
 .../arm/neoverse-n2r0p3-v2/l1i_cache.json     |  14 +
 .../arm/neoverse-n2r0p3-v2/l2_cache.json      |  50 +++
 .../arm/neoverse-n2r0p3-v2/l3_cache.json      |  22 ++
 .../arm/neoverse-n2r0p3-v2/ll_cache.json      |  10 +
 .../arm64/arm/neoverse-n2r0p3-v2/memory.json  |  46 +++
 .../arm64/arm/neoverse-n2r0p3-v2/metrics.json | 331 ++++++++++++++++++
 .../arm64/arm/neoverse-n2r0p3-v2/retired.json |  30 ++
 .../arm64/arm/neoverse-n2r0p3-v2/spe.json     |  18 +
 .../neoverse-n2r0p3-v2/spec_operation.json    | 110 ++++++
 .../arm64/arm/neoverse-n2r0p3-v2/stall.json   |  30 ++
 .../arm64/arm/neoverse-n2r0p3-v2/sve.json     |  50 +++
 .../arm64/arm/neoverse-n2r0p3-v2/tlb.json     |  66 ++++
 .../arm64/arm/neoverse-n2r0p3-v2/trace.json   |  38 ++
 tools/perf/pmu-events/arch/arm64/mapfile.csv  |   3 +-
 19 files changed, 983 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/bus.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/exception.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/fp_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/general.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1d_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1i_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l2_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l3_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/ll_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/memory.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/retired.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spe.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spec_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/stall.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/sve.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/tlb.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/trace.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/bus.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/bus.json
new file mode 100644
index 000000000000..2e11a8c4a484
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/bus.json
@@ -0,0 +1,18 @@
+[
+    {
+        "ArchStdEvent": "BUS_ACCESS",
+        "PublicDescription": "Counts memory transactions issued by the CPU to the external bus, including snoop requests and snoop responses. Each beat of data is counted individually."
+    },
+    {
+        "ArchStdEvent": "BUS_CYCLES",
+        "PublicDescription": "Counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event is a duplicate of CPU_CYCLES."
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_RD",
+        "PublicDescription": "Counts memory read transactions seen on the external bus. Each beat of data is counted individually."
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_WR",
+        "PublicDescription": "Counts memory write transactions seen on the external bus. Each beat of data is counted individually."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/exception.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/exception.json
new file mode 100644
index 000000000000..4404b8e91690
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/exception.json
@@ -0,0 +1,62 @@
+[
+    {
+        "ArchStdEvent": "EXC_TAKEN",
+        "PublicDescription": "Counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_RETURN",
+        "PublicDescription": "Counts any architecturally executed exception return instructions. Eg: AArch64: ERET"
+    },
+    {
+        "ArchStdEvent": "EXC_UNDEF",
+        "PublicDescription": "Counts the number of synchronous exceptions which are taken locally that are due to attempting to execute an instruction that is UNDEFINED. Attempting to execute instruction bit patterns that have not been allocated. Attempting to execute instructions when they are disabled. Attempting to execute instructions at an inappropriate Exception level. Attempting to execute an instruction when the value of PSTATE.IL is 1."
+    },
+    {
+        "ArchStdEvent": "EXC_SVC",
+        "PublicDescription": "Counts SVC exceptions taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_PABORT",
+        "PublicDescription": "Counts synchronous exceptions that are taken locally and caused by Instruction Aborts."
+    },
+    {
+        "ArchStdEvent": "EXC_DABORT",
+        "PublicDescription": "Counts exceptions that are taken locally and are caused by data aborts or SErrors. Conditions that could cause those exceptions are attempting to read or write memory where the MMU generates a fault, attempting to read or write memory with a misaligned address, interrupts from the nSEI inputs and internally generated SErrors."
+    },
+    {
+        "ArchStdEvent": "EXC_IRQ",
+        "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_FIQ",
+        "PublicDescription": "Counts FIQ exceptions including the virtual FIQs that are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_SMC",
+        "PublicDescription": "Counts SMC exceptions take to EL3."
+    },
+    {
+        "ArchStdEvent": "EXC_HVC",
+        "PublicDescription": "Counts HVC exceptions taken to EL2."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_PABORT",
+        "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Instruction Aborts. For example, attempting to execute an instruction with a misaligned PC."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_DABORT",
+        "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Data Aborts or SError interrupts. Conditions that could cause those exceptions are:\n\n1. Attempting to read or write memory where the MMU generates a fault,\n2. Attempting to read or write memory with a misaligned address,\n3. Interrupts from the SEI input.\n4. internally generated SErrors."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_OTHER",
+        "PublicDescription": "Counts the number of synchronous trap exceptions which are not taken locally and are not SVC, SMC, HVC, data aborts, Instruction Aborts, or interrupts."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_IRQ",
+        "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are not taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_FIQ",
+        "PublicDescription": "Counts FIQs which are not taken locally but taken from EL0, EL1,\n or EL2 to EL3 (which would be the normal behavior for FIQs when not executing\n in EL3)."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/fp_operation.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/fp_operation.json
new file mode 100644
index 000000000000..cec3435ac766
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/fp_operation.json
@@ -0,0 +1,22 @@
+[
+    {
+        "ArchStdEvent": "FP_HP_SPEC",
+        "PublicDescription": "Counts speculatively executed half precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_SP_SPEC",
+        "PublicDescription": "Counts speculatively executed single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_DP_SPEC",
+        "PublicDescription": "Counts speculatively executed double precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_SCALE_OPS_SPEC",
+        "PublicDescription": "Counts speculatively executed scalable single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_FIXED_OPS_SPEC",
+        "PublicDescription": "Counts speculatively executed non-scalable single precision floating point operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/general.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/general.json
new file mode 100644
index 000000000000..428810f855b8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/general.json
@@ -0,0 +1,10 @@
+[
+    {
+        "ArchStdEvent": "CPU_CYCLES",
+        "PublicDescription": "Counts CPU clock cycles (not timer cycles). The clock measured by this event is defined as the physical clock driving the CPU logic."
+    },
+    {
+        "ArchStdEvent": "CNT_CYCLES",
+        "PublicDescription": "Counts constant frequency cycles"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1d_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1d_cache.json
new file mode 100644
index 000000000000..ed83e1c5affe
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1d_cache.json
@@ -0,0 +1,54 @@
+[
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL",
+        "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load or store operations that missed in the level 1 data cache. This event only counts one event per cache line. This event does not count cache line allocations from preload instructions or from hardware cache prefetching."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE",
+        "PublicDescription": "Counts level 1 data cache accesses from any load/store operations. Atomic operations that resolve in the CPUs caches (near atomic operations) counts as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other level 1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB",
+        "PublicDescription": "Counts write-backs of dirty data from the L1 data cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 data cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n\n1. Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n2. Full line writes which write to L2 without writing L1, such as write streaming mode."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_LMISS_RD",
+        "PublicDescription": "Counts cache line refills into the level 1 data cache from any memory read operations, that incurred additional latency."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_RD",
+        "PublicDescription": "Counts level 1 data cache accesses from any load operation. Atomic load operations that resolve in the CPUs caches counts as both a write access and read access."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WR",
+        "PublicDescription": "Counts level 1 data cache accesses generated by store operations. This event also counts accesses caused by a DC ZVA (data cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPUs caches count as a write access and read access."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_RD",
+        "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load instructions where the memory read operation misses in the level 1 data cache. This event only counts one event per cache line."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_WR",
+        "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed store instructions where the memory write operation misses in the level 1 data cache. This event only counts one event per cache line."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
+        "PublicDescription": "Counts level 1 data cache refills where the cache line data came from caches inside the immediate cluster of the core."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
+        "PublicDescription": "Counts level 1 data cache refills for which the cache line data came from outside the immediate cluster of the core, like an SLC in the system interconnect or DRAM."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_VICTIM",
+        "PublicDescription": "Counts dirty cache line evictions from the level 1 data cache caused by a new cache line allocation. This event does not count evictions caused by cache maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_CLEAN",
+        "PublicDescription": "Counts write-backs from the level 1 data cache that are a result of a coherency operation made by another CPU. Event count includes cache maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_INVAL",
+        "PublicDescription": "Counts each explicit invalidation of a cache line in the level 1 data cache caused by:\n\n- Cache Maintenance Operations (CMO) that operate by a virtual address.\n- Broadcast cache coherency operations from another CPU in the system.\n\nThis event does not count for the following conditions:\n\n1. A cache refill invalidates a cache line.\n2. A CMO which is executed on that CPU and invalidates a cache line specified by set/way.\n\nNote that CMOs that operate by set/way cannot be broadcast from one CPU to another."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1i_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1i_cache.json
new file mode 100644
index 000000000000..633f1030359d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l1i_cache.json
@@ -0,0 +1,14 @@
+[
+    {
+        "ArchStdEvent": "L1I_CACHE_REFILL",
+        "PublicDescription": "Counts cache line refills in the level 1 instruction cache caused by a missed instruction fetch. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE",
+        "PublicDescription": "Counts instruction fetches which access the level 1 instruction cache. Instruction cache accesses caused by cache maintenance operations are not counted."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_LMISS",
+        "PublicDescription": "Counts cache line refills into the level 1 instruction cache, that incurred additional latency."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l2_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l2_cache.json
new file mode 100644
index 000000000000..0e31d0daf88b
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l2_cache.json
@@ -0,0 +1,50 @@
+[
+    {
+        "ArchStdEvent": "L2D_CACHE",
+        "PublicDescription": "Counts level 2 cache accesses. level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the first level caches or translation resolutions due to accesses. This event also counts write back of dirty data from level 1 data cache to the L2 cache."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL",
+        "PublicDescription": "Counts cache line refills into the level 2 cache. level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB",
+        "PublicDescription": "Counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_ALLOCATE",
+        "PublicDescription": "TBD"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_RD",
+        "PublicDescription": "Counts level 2 cache accesses due to memory read operations. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WR",
+        "PublicDescription": "Counts level 2 cache accesses due to memory write operations. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_RD",
+        "PublicDescription": "Counts refills for memory accesses due to memory read operation counted by L2D_CACHE_RD. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_WR",
+        "PublicDescription": "Counts refills for memory accesses due to memory write operation counted by L2D_CACHE_WR. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_VICTIM",
+        "PublicDescription": "Counts evictions from the level 2 cache because of a line being allocated into the L2 cache."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_CLEAN",
+        "PublicDescription": "Counts write-backs from the level 2 cache that are a result of either:\n\n1. Cache maintenance operations,\n\n2. Snoop responses or,\n\n3. Direct cache transfers to another CPU due to a forwarding snoop request."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_INVAL",
+        "PublicDescription": "Counts each explicit invalidation of a cache line in the level 2 cache by cache maintenance operations that operate by a virtual address, or by external coherency operations. This event does not count if either:\n\n1. A cache refill invalidates a cache line or,\n2. A Cache Maintenance Operation (CMO), which invalidates a cache line specified by set/way, is executed on that CPU.\n\nCMOs that operate by set/way cannot be broadcast from one CPU to another."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_LMISS_RD",
+        "PublicDescription": "Counts cache line refills into the level 2 unified cache from any memory read operations that incurred additional latency."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l3_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l3_cache.json
new file mode 100644
index 000000000000..45bfba532df7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/l3_cache.json
@@ -0,0 +1,22 @@
+[
+    {
+        "ArchStdEvent": "L3D_CACHE_ALLOCATE",
+        "PublicDescription": "Counts level 3 cache line allocates that do not fetch data from outside the level 3 data or unified cache. For example, allocates due to streaming stores."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL",
+        "PublicDescription": "Counts level 3 accesses that receive data from outside the L3 cache."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE",
+        "PublicDescription": "Counts level 3 cache accesses. level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_RD",
+        "PublicDescription": "TBD"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_LMISS_RD",
+        "PublicDescription": "Counts any cache line refill into the level 3 cache from memory read operations that incurred additional latency."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/ll_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/ll_cache.json
new file mode 100644
index 000000000000..bb712d57d58a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/ll_cache.json
@@ -0,0 +1,10 @@
+[
+    {
+        "ArchStdEvent": "LL_CACHE_RD",
+        "PublicDescription": "Counts read transactions that were returned from outside the core cluster. This event counts when the system register CPUECTLR.EXTLLC bit is set. This event counts read transactions returned from outside the core if those transactions are either hit in the system level cache or missed in the SLC and are returned from any other external sources."
+    },
+    {
+        "ArchStdEvent": "LL_CACHE_MISS_RD",
+        "PublicDescription": "Counts read transactions that were returned from outside the core cluster but missed in the system level cache. This event counts when the system register CPUECTLR.EXTLLC bit is set. This event counts read transactions returned from outside the core if those transactions are missed in the System level Cache. The data source of the transaction is indicated by a field in the CHI transaction returning to the CPU. This event does not count reads caused by cache maintenance operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/memory.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/memory.json
new file mode 100644
index 000000000000..106a97f8b2e7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/memory.json
@@ -0,0 +1,46 @@
+[
+    {
+        "ArchStdEvent": "MEM_ACCESS",
+        "PublicDescription": "Counts memory accesses issued by the CPU load store unit, where those accesses are issued due to load or store operations. This event counts memory accesses no matter whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions."
+    },
+    {
+        "ArchStdEvent": "MEMORY_ERROR",
+        "PublicDescription": "Counts any detected correctable or uncorrectable physical memory errors (ECC or parity) in protected CPUs RAMs. On the core, this event counts errors in the caches (including data and tag rams). Any detected memory error (from either a speculative and abandoned access, or an architecturally executed access) is counted. Note that errors are only detected when the actual protected memory is accessed by an operation."
+    },
+    {
+        "ArchStdEvent": "REMOTE_ACCESS",
+        "PublicDescription": "Counts accesses to another chip, which is implemented as a different CMN mesh in the system. If the CHI bus response back to the core indicates that the data source is from another chip (mesh), then the counter is updated. If no data is returned, even if the system snoops another chip/mesh, then the counter is not updated."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_RD",
+        "PublicDescription": "Counts memory accesses issued by the CPU due to load operations. The event counts any memory load access, no matter whether the data is received from any level of cache hierarchy or external memory. The event also counts atomic load operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_WR",
+        "PublicDescription": "Counts memory accesses issued by the CPU due to store operations. The event counts any memory store access, no matter whether the data is located in any level of cache or external memory. The event also counts atomic load and store operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+    },
+    {
+        "ArchStdEvent": "LDST_ALIGN_LAT",
+        "PublicDescription": "Counts the number of memory read and write accesses in a cycle that incurred additional latency, due to the alignment of the address and the size of data being accessed, which results in store crossing a single cache line."
+    },
+    {
+        "ArchStdEvent": "LD_ALIGN_LAT",
+        "PublicDescription": "Counts the number of memory read accesses in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed, which results in load crossing a single cache line."
+    },
+    {
+        "ArchStdEvent": "ST_ALIGN_LAT",
+        "PublicDescription": "Counts the number of memory write access in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed incurred additional latency."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_CHECKED",
+        "PublicDescription": "Counts the number of memory read and write accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_CHECKED_RD",
+        "PublicDescription": "Counts the number of memory read accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_CHECKED_WR",
+        "PublicDescription": "Counts the number of memory write accesses in a cycle that is tag checked by the Memory Tagging Extension (MTE)."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
new file mode 100644
index 000000000000..b01cc2120175
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
@@ -0,0 +1,331 @@
+[
+    {
+        "ArchStdEvent": "backend_bound",
+        "MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * #slots)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
+    },
+    {
+        "MetricName": "backend_stalled_cycles",
+        "MetricExpr": "((STALL_BACKEND / CPU_CYCLES) * 100)",
+        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
+        "MetricGroup": "Cycle_Accounting",
+        "ScaleUnit": "1percent of cycles"
+    },
+    {
+        "ArchStdEvent": "bad_speculation",
+        "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 - (STALL_SLOT / (CPU_CYCLES * #slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
+    },
+    {
+        "MetricName": "branch_misprediction_ratio",
+        "MetricExpr": "(BR_MIS_PRED_RETIRED / BR_RETIRED)",
+        "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
+        "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
+        "ScaleUnit": "1per branch"
+    },
+    {
+        "MetricName": "branch_mpki",
+        "MetricExpr": "((BR_MIS_PRED_RETIRED / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
+        "MetricGroup": "MPKI;Branch_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "branch_percentage",
+        "MetricExpr": "(((BR_IMMED_SPEC + BR_INDIRECT_SPEC) / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures branch operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "crypto_percentage",
+        "MetricExpr": "((CRYPTO_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "dtlb_mpki",
+        "MetricExpr": "((DTLB_WALK / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of data TLB Walks per thousand instructions executed.",
+        "MetricGroup": "MPKI;DTLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "dtlb_walk_ratio",
+        "MetricExpr": "(DTLB_WALK / L1D_TLB)",
+        "BriefDescription": "This metric measures the ratio of data TLB Walks to the total number of data TLB accesses. This gives an indication of the effectiveness of the data TLB accesses.",
+        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "ArchStdEvent": "frontend_bound",
+        "MetricExpr": "(100 * ((STALL_SLOT_FRONTEND / (CPU_CYCLES * #slots)) - (BR_MIS_PRED / CPU_CYCLES)))"
+    },
+    {
+        "MetricName": "frontend_stalled_cycles",
+        "MetricExpr": "((STALL_FRONTEND / CPU_CYCLES) * 100)",
+        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
+        "MetricGroup": "Cycle_Accounting",
+        "ScaleUnit": "1percent of cycles"
+    },
+    {
+        "MetricName": "integer_dp_percentage",
+        "MetricExpr": "((DP_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "ipc",
+        "MetricExpr": "(INST_RETIRED / CPU_CYCLES)",
+        "BriefDescription": "This metric measures the number of instructions retired per cycle.",
+        "MetricGroup": "General",
+        "ScaleUnit": "1per cycle"
+    },
+    {
+        "MetricName": "itlb_mpki",
+        "MetricExpr": "((ITLB_WALK / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
+        "MetricGroup": "MPKI;ITLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "itlb_walk_ratio",
+        "MetricExpr": "(ITLB_WALK / L1I_TLB)",
+        "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of instruction TLB accesses. This gives an indication of the effectiveness of the instruction TLB accesses.",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l1d_cache_miss_ratio",
+        "MetricExpr": "(L1D_CACHE_REFILL / L1D_CACHE)",
+        "BriefDescription": "This metric measures the ratio of level 1 data cache accesses missed to the total number of level 1 data cache accesses. This gives an indication of the effectiveness of the level 1 data cache.",
+        "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "l1d_cache_mpki",
+        "MetricExpr": "((L1D_CACHE_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 data cache accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;L1D_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l1d_tlb_miss_ratio",
+        "MetricExpr": "(L1D_TLB_REFILL / L1D_TLB)",
+        "BriefDescription": "This metric measures the ratio of level 1 data TLB accesses missed to the total number of level 1 data TLB accesses. This gives an indication of the effectiveness of the level 1 data TLB.",
+        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l1d_tlb_mpki",
+        "MetricExpr": "((L1D_TLB_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;DTLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l1i_cache_miss_ratio",
+        "MetricExpr": "(L1I_CACHE_REFILL / L1I_CACHE)",
+        "BriefDescription": "This metric measures the ratio of level 1 instruction cache accesses missed to the total number of level 1 instruction cache accesses. This gives an indication of the effectiveness of the level 1 instruction cache.",
+        "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "l1i_cache_mpki",
+        "MetricExpr": "((L1I_CACHE_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 instruction cache accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;L1I_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l1i_tlb_miss_ratio",
+        "MetricExpr": "(L1I_TLB_REFILL / L1I_TLB)",
+        "BriefDescription": "This metric measures the ratio of level 1 instruction TLB accesses missed to the total number of level 1 instruction TLB accesses. This gives an indication of the effectiveness of the level 1 instruction TLB.",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l1i_tlb_mpki",
+        "MetricExpr": "((L1I_TLB_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;ITLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l2_cache_miss_ratio",
+        "MetricExpr": "(L2D_CACHE_REFILL / L2D_CACHE)",
+        "BriefDescription": "This metric measures the ratio of level 2 cache accesses missed to the total number of level 2 cache accesses. This gives an indication of the effectiveness of the level 2 cache, which is a unified cache that stores both data and instruction. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+        "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "l2_cache_mpki",
+        "MetricExpr": "((L2D_CACHE_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 2 unified cache accesses missed per thousand instructions executed. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+        "MetricGroup": "MPKI;L2_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l2_tlb_miss_ratio",
+        "MetricExpr": "(L2D_TLB_REFILL / L2D_TLB)",
+        "BriefDescription": "This metric measures the ratio of level 2 unified TLB accesses missed to the total number of level 2 unified TLB accesses. This gives an indication of the effectiveness of the level 2 TLB.",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l2_tlb_mpki",
+        "MetricExpr": "((L2D_TLB_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 2 unified TLB accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "ll_cache_read_hit_ratio",
+        "MetricExpr": "((LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD)",
+        "BriefDescription": "This metric measures the ratio of last level cache read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+        "MetricGroup": "LL_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "ll_cache_read_miss_ratio",
+        "MetricExpr": "(LL_CACHE_MISS_RD / LL_CACHE_RD)",
+        "BriefDescription": "This metric measures the ratio of last level cache read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+        "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "ll_cache_read_mpki",
+        "MetricExpr": "((LL_CACHE_MISS_RD / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of last level cache read accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;LL_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "load_percentage",
+        "MetricExpr": "((LD_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures load operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "ArchStdEvent": "retiring"
+    },
+    {
+        "MetricName": "scalar_fp_percentage",
+        "MetricExpr": "((VFP_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "simd_percentage",
+        "MetricExpr": "((ASE_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "store_percentage",
+        "MetricExpr": "((ST_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures store operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "MPKI;L3_Cache_Effectiveness",
+        "MetricName": "l3d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Miss_Ratio;L3_Cache_Effectiveness",
+        "MetricName": "l3d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "MPKI;Branch_Effectiveness",
+        "MetricName": "branch_pki",
+        "ScaleUnit": "1PKI"
+    },
+    {
+        "MetricExpr": "ipc / #slots",
+        "BriefDescription": "IPC percentage of peak. The peak of IPC is the number of slots.",
+        "MetricGroup": "General",
+        "MetricName": "ipc_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "INST_SPEC / CPU_CYCLES",
+        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "General",
+        "MetricName": "spec_ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)",
+        "MetricGroup": "General",
+        "MetricName": "retired_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)",
+        "MetricGroup": "General",
+        "MetricName": "wasted_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "load_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "store_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "advanced_simd_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "float_point_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "branch_immed_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "branch_return_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "branch_indirect_spec_rate",
+        "ScaleUnit": "100%"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/retired.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/retired.json
new file mode 100644
index 000000000000..f297b049b62f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/retired.json
@@ -0,0 +1,30 @@
+[
+    {
+        "ArchStdEvent": "SW_INCR",
+        "PublicDescription": "Counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\n\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a read/increment/write sequence to the PMSWINC_EL0 register."
+    },
+    {
+        "ArchStdEvent": "INST_RETIRED",
+        "PublicDescription": "Counts instructions that have been architecturally executed."
+    },
+    {
+        "ArchStdEvent": "CID_WRITE_RETIRED",
+        "PublicDescription": "Counts architecturally executed writes to the CONTEXTIDR register, which usually contain the kernel PID and can be output with hardware trace."
+    },
+    {
+        "ArchStdEvent": "TTBR_WRITE_RETIRED",
+        "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
+    },
+    {
+        "ArchStdEvent": "BR_RETIRED",
+        "PublicDescription": "Counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted."
+    },
+    {
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+        "PublicDescription": "Counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "OP_RETIRED",
+        "PublicDescription": "Counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spe.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spe.json
new file mode 100644
index 000000000000..5de8b0f3a440
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spe.json
@@ -0,0 +1,18 @@
+[
+    {
+        "ArchStdEvent": "SAMPLE_POP",
+        "PublicDescription": "Counts statistical profiling sample population, the count of all operations that could be sampled but may or may not be chosen for sampling."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED",
+        "PublicDescription": "Counts statistical profiling samples taken for sampling."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FILTRATE",
+        "PublicDescription": "Counts statistical profiling samples taken which are not removed by filtering."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_COLLISION",
+        "PublicDescription": "Counts statistical profiling samples that have collided with a previous sample and so therefore not taken."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spec_operation.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spec_operation.json
new file mode 100644
index 000000000000..1af961f8a6c8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/spec_operation.json
@@ -0,0 +1,110 @@
+[
+    {
+        "ArchStdEvent": "BR_MIS_PRED",
+        "PublicDescription": "Counts branches which are speculatively executed and mispredicted."
+    },
+    {
+        "ArchStdEvent": "BR_PRED",
+        "PublicDescription": "Counts branches speculatively executed and were predicted right."
+    },
+    {
+        "ArchStdEvent": "INST_SPEC",
+        "PublicDescription": "Counts operations that have been speculatively executed."
+    },
+    {
+        "ArchStdEvent": "OP_SPEC",
+        "PublicDescription": "Counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LD_SPEC",
+        "PublicDescription": "Counts unaligned memory read operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses. The event does not count preload operations (PLD, PLI)."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_ST_SPEC",
+        "PublicDescription": "Counts unaligned memory write operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LDST_SPEC",
+        "PublicDescription": "Counts unaligned memory operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses."
+    },
+    {
+        "ArchStdEvent": "LDREX_SPEC",
+        "PublicDescription": "Counts Load-Exclusive operations that have been speculatively executed. Eg: LDREX, LDX"
+    },
+    {
+        "ArchStdEvent": "STREX_PASS_SPEC",
+        "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have successfully completed the store operation."
+    },
+    {
+        "ArchStdEvent": "STREX_FAIL_SPEC",
+        "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have not successfully completed the store operation."
+    },
+    {
+        "ArchStdEvent": "STREX_SPEC",
+        "PublicDescription": "Counts store-exclusive operations that have been speculatively executed."
+    },
+    {
+        "ArchStdEvent": "LD_SPEC",
+        "PublicDescription": "Counts speculatively executed load operations including Single Instruction Multiple Data (SIMD) load operations."
+    },
+    {
+        "ArchStdEvent": "ST_SPEC",
+        "PublicDescription": "Counts speculatively executed store operations including Single Instruction Multiple Data (SIMD) store operations."
+    },
+    {
+        "ArchStdEvent": "DP_SPEC",
+        "PublicDescription": "Counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
+    },
+    {
+        "ArchStdEvent": "ASE_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD operations excluding load, store and move micro-operations that move data to or from SIMD (vector) registers."
+    },
+    {
+        "ArchStdEvent": "VFP_SPEC",
+        "PublicDescription": "Counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
+    },
+    {
+        "ArchStdEvent": "PC_WRITE_SPEC",
+        "PublicDescription": "Counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
+    },
+    {
+        "ArchStdEvent": "CRYPTO_SPEC",
+        "PublicDescription": "Counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_SPEC",
+        "PublicDescription": "Counts immediate branch operations which are speculatively executed."
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_SPEC",
+        "PublicDescription": "Counts procedure return operations (RET) which are speculatively executed."
+    },
+    {
+        "ArchStdEvent": "BR_INDIRECT_SPEC",
+        "PublicDescription": "Counts indirect branch operations including procedure returns, which are speculatively executed. This includes operations that force a software change of the PC, other than exception-generating operations.  Eg: BR Xn, RET"
+    },
+    {
+        "ArchStdEvent": "ISB_SPEC",
+        "PublicDescription": "Counts ISB operations that are executed."
+    },
+    {
+        "ArchStdEvent": "DSB_SPEC",
+        "PublicDescription": "Counts DSB operations that are speculatively issued to Load/Store unit in the CPU."
+    },
+    {
+        "ArchStdEvent": "DMB_SPEC",
+        "PublicDescription": "Counts DMB operations that are speculatively issued to the Load/Store unit in the CPU. This event does not count implied barriers from load acquire/store release operations."
+    },
+    {
+        "ArchStdEvent": "RC_LD_SPEC",
+        "PublicDescription": "Counts any load acquire operations that are speculatively executed. Eg: LDAR, LDARH, LDARB"
+    },
+    {
+        "ArchStdEvent": "RC_ST_SPEC",
+        "PublicDescription": "Counts any store release operations that are speculatively executed. Eg: STLR, STLRH, STLRB'"
+    },
+    {
+        "ArchStdEvent": "ASE_INST_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/stall.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/stall.json
new file mode 100644
index 000000000000..bbbebc805034
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/stall.json
@@ -0,0 +1,30 @@
+[
+    {
+        "ArchStdEvent": "STALL_FRONTEND",
+        "PublicDescription": "Counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. All the frontend slots were empty during the cycle when this event counts."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND",
+        "PublicDescription": "Counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
+    },
+    {
+        "ArchStdEvent": "STALL",
+        "PublicDescription": "Counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall)."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_BACKEND",
+        "PublicDescription": "Counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_FRONTEND",
+        "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT",
+        "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall)."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_MEM",
+        "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the last level core cache."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/sve.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/sve.json
new file mode 100644
index 000000000000..51dab48cb2ba
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/sve.json
@@ -0,0 +1,50 @@
+[
+    {
+        "ArchStdEvent": "SVE_INST_SPEC",
+        "PublicDescription": "Counts speculatively executed operations that are SVE operations."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_EMPTY_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with no active predicate elements."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_FULL_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with all predicate elements active."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one but not all active predicate elements."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one non active predicate elements."
+    },
+    {
+        "ArchStdEvent": "SVE_LDFF_SPEC",
+        "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations."
+    },
+    {
+        "ArchStdEvent": "SVE_LDFF_FAULT_SPEC",
+        "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations that clear at least one bit in the FFR."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT8_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type an 8-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT16_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT32_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT64_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/tlb.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/tlb.json
new file mode 100644
index 000000000000..b550af1831f5
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/tlb.json
@@ -0,0 +1,66 @@
+[
+    {
+        "ArchStdEvent": "L1I_TLB_REFILL",
+        "PublicDescription": "Counts level 1 instruction TLB refills from any Instruction fetch. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL",
+        "PublicDescription": "Counts level 1 data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT(address translation) instruction."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB",
+        "PublicDescription": "Counts level 1 data TLB accesses caused by any memory load or store operation. Note that load or store instructions can be broken up into multiple memory operations. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1I_TLB",
+        "PublicDescription": "Counts level 1 instruction TLB accesses, whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL",
+        "PublicDescription": "Counts level 2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB",
+        "PublicDescription": "Counts level 2 TLB accesses except those caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK",
+        "PublicDescription": "Counts data memory translation table walks caused by a miss in the L2 TLB driven by a memory access. Note that partial translations that also cause a table walk are counted. This event does not count table walks caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK",
+        "PublicDescription": "Counts instruction memory translation table walks caused by a miss in the L2 TLB driven by a memory access. Partial translations that also cause a table walk are counted. This event does not count table walks caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_RD",
+        "PublicDescription": "Counts level 1 data TLB refills caused by memory read operations. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an Address Translation (AT) instruction."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_WR",
+        "PublicDescription": "Counts level 1 data TLB refills caused by data side memory write operations. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count with an access from an Address Translation (AT) instruction."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_RD",
+        "PublicDescription": "Counts level 1 data TLB accesses caused by memory read operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_WR",
+        "PublicDescription": "Counts any L1 data side TLB accesses caused by memory write operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_RD",
+        "PublicDescription": "Counts level 2 TLB refills caused by memory read operations from both data and instruction fetch except for those caused by TLB maintenance operations or hardware prefetches."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_WR",
+        "PublicDescription": "Counts level 2 TLB refills caused by memory write operations from both data and instruction fetch except for those caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_RD",
+        "PublicDescription": "Counts level 2 TLB accesses caused by memory read operations from both data and instruction fetch except for those caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_WR",
+        "PublicDescription": "Counts level 2 TLB accesses caused by memory write operations from both data and instruction fetch except for those caused by TLB maintenance operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/trace.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/trace.json
new file mode 100644
index 000000000000..98f6fabfebc7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/trace.json
@@ -0,0 +1,38 @@
+[
+    {
+        "ArchStdEvent": "TRB_WRAP",
+        "PublicDescription": "This event is generated each time the current write pointer is wrapped to the base pointer."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT0",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 0."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT1",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 1."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT2",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 2."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT3",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 3."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT4",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 4."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT5",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 5."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT6",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 6."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT7",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 7."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
index 32674ddd2b63..13d027656c26 100644
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@@ -35,7 +35,8 @@
 0x00000000410fd470,v1,arm/cortex-a710,core
 0x00000000410fd480,v1,arm/cortex-x2,core
 0x00000000410fd490,v1,arm/neoverse-n2-v2,core
-0x00000000410fd4f0,v1,arm/neoverse-n2-v2,core
+0x00000000410fd493,v1,arm/neoverse-n2r0p3-v2,core
+0x00000000410fd4f0,v1,arm/neoverse-n2r0p3-v2,core
 0x00000000420f5160,v1,cavium/thunderx2,core
 0x00000000430f0af0,v1,cavium/thunderx2,core
 0x00000000460f0010,v1,fujitsu/a64fx,core
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] perf vendor events arm64: Update N2-r0p0 metrics and events using Arm telemetry repo
  2023-07-10 14:18 [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo James Clark
                   ` (3 preceding siblings ...)
  2023-07-10 14:19 ` [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo James Clark
@ 2023-07-10 14:19 ` James Clark
  4 siblings, 0 replies; 17+ messages in thread
From: James Clark @ 2023-07-10 14:19 UTC (permalink / raw)
  To: linux-perf-users, irogers, renyu.zj, john.g.garry
  Cc: namhyung, acme, James Clark

Apart from some slight naming and grouping differences the new metrics
are functionally the same as the existing ones. Any missing metrics were
manually appended to the end of the auto generated file.

For the events, the new data includes descriptions that may have product
specific details and new groupings that will be consistent with other
products.

After generating the metrics from the telemetry repo [1], the following
manual steps were performed:

 * Append some metrics from the old N2/V2 data that aren't present in
   the telemetry data. These will possibly be added to the
   telemetry-solution repo at a later time:

    l3d_cache_mpki, l3d_cache_miss_rate, branch_pki, ipc_rate, spec_ipc,
    retired_rate, wasted_rate, load_spec_rate, store_spec_rate,
    advanced_simd_spec_rate, float_point_spec_rate,
    branch_immed_spec_rate, branch_return_spec_rate,
    branch_indirect_spec_rate

[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2.json

Signed-off-by: James Clark <james.clark@arm.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/branch.json |   8 -
 .../arch/arm64/arm/neoverse-n2-v2/bus.json    |  20 --
 .../arch/arm64/arm/neoverse-n2-v2/cache.json  | 155 --------
 .../arm64/arm/neoverse-n2-v2/exception.json   |  47 ---
 .../arm64/arm/neoverse-n2-v2/instruction.json | 143 --------
 .../arch/arm64/arm/neoverse-n2-v2/memory.json |  41 ---
 .../arm64/arm/neoverse-n2-v2/metrics.json     | 273 --------------
 .../arm64/arm/neoverse-n2-v2/pipeline.json    |  23 --
 .../arch/arm64/arm/neoverse-n2-v2/spe.json    |  14 -
 .../arch/arm64/arm/neoverse-n2-v2/trace.json  |  29 --
 .../arch/arm64/arm/neoverse-n2r0p0/bus.json   |  18 +
 .../arm64/arm/neoverse-n2r0p0/exception.json  |  62 ++++
 .../arm/neoverse-n2r0p0/fp_operation.json     |  22 ++
 .../arm64/arm/neoverse-n2r0p0/general.json    |  10 +
 .../arm64/arm/neoverse-n2r0p0/l1d_cache.json  |  54 +++
 .../arm64/arm/neoverse-n2r0p0/l1i_cache.json  |  14 +
 .../arm64/arm/neoverse-n2r0p0/l2_cache.json   |  50 +++
 .../arm64/arm/neoverse-n2r0p0/l3_cache.json   |  22 ++
 .../arm64/arm/neoverse-n2r0p0/ll_cache.json   |  10 +
 .../arm64/arm/neoverse-n2r0p0/memory.json     |  46 +++
 .../arm64/arm/neoverse-n2r0p0/metrics.json    | 332 ++++++++++++++++++
 .../arm64/arm/neoverse-n2r0p0/retired.json    |  30 ++
 .../arch/arm64/arm/neoverse-n2r0p0/spe.json   |  18 +
 .../arm/neoverse-n2r0p0/spec_operation.json   | 110 ++++++
 .../arch/arm64/arm/neoverse-n2r0p0/stall.json |  30 ++
 .../arch/arm64/arm/neoverse-n2r0p0/sve.json   |  50 +++
 .../arch/arm64/arm/neoverse-n2r0p0/tlb.json   |  66 ++++
 .../arch/arm64/arm/neoverse-n2r0p0/trace.json |  38 ++
 tools/perf/pmu-events/arch/arm64/mapfile.csv  |   2 +-
 29 files changed, 983 insertions(+), 754 deletions(-)
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/bus.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/exception.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/memory.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spe.json
 delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/trace.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/bus.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/exception.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/fp_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/general.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1d_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1i_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l2_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l3_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/ll_cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/memory.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/metrics.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/retired.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spe.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spec_operation.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/stall.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/sve.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/tlb.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/trace.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json
deleted file mode 100644
index 79f2016c53b0..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json
+++ /dev/null
@@ -1,8 +0,0 @@
-[
-    {
-        "ArchStdEvent": "BR_MIS_PRED"
-    },
-    {
-        "ArchStdEvent": "BR_PRED"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/bus.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/bus.json
deleted file mode 100644
index 579c1c993d17..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/bus.json
+++ /dev/null
@@ -1,20 +0,0 @@
-[
-    {
-        "ArchStdEvent": "CPU_CYCLES"
-    },
-    {
-        "ArchStdEvent": "BUS_ACCESS"
-    },
-    {
-        "ArchStdEvent": "BUS_CYCLES"
-    },
-    {
-        "ArchStdEvent": "BUS_ACCESS_RD"
-    },
-    {
-        "ArchStdEvent": "BUS_ACCESS_WR"
-    },
-    {
-        "ArchStdEvent": "CNT_CYCLES"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json
deleted file mode 100644
index 0141f749bff3..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json
+++ /dev/null
@@ -1,155 +0,0 @@
-[
-    {
-        "ArchStdEvent": "L1I_CACHE_REFILL"
-    },
-    {
-        "ArchStdEvent": "L1I_TLB_REFILL"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_REFILL"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE"
-    },
-    {
-        "ArchStdEvent": "L1D_TLB_REFILL"
-    },
-    {
-        "ArchStdEvent": "L1I_CACHE"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_WB"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_REFILL"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_WB"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_ALLOCATE"
-    },
-    {
-        "ArchStdEvent": "L1D_TLB"
-    },
-    {
-        "ArchStdEvent": "L1I_TLB"
-    },
-    {
-        "ArchStdEvent": "L3D_CACHE_ALLOCATE"
-    },
-    {
-        "ArchStdEvent": "L3D_CACHE_REFILL"
-    },
-    {
-        "ArchStdEvent": "L3D_CACHE"
-    },
-    {
-        "ArchStdEvent": "L2D_TLB_REFILL"
-    },
-    {
-        "ArchStdEvent": "L2D_TLB"
-    },
-    {
-        "ArchStdEvent": "DTLB_WALK"
-    },
-    {
-        "ArchStdEvent": "ITLB_WALK"
-    },
-    {
-        "ArchStdEvent": "LL_CACHE_RD"
-    },
-    {
-        "ArchStdEvent": "LL_CACHE_MISS_RD"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_LMISS_RD"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_RD"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_WR"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_REFILL_RD"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_REFILL_WR"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_REFILL_INNER"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_REFILL_OUTER"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_WB_VICTIM"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_WB_CLEAN"
-    },
-    {
-        "ArchStdEvent": "L1D_CACHE_INVAL"
-    },
-    {
-        "ArchStdEvent": "L1D_TLB_REFILL_RD"
-    },
-    {
-        "ArchStdEvent": "L1D_TLB_REFILL_WR"
-    },
-    {
-        "ArchStdEvent": "L1D_TLB_RD"
-    },
-    {
-        "ArchStdEvent": "L1D_TLB_WR"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_RD"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_WR"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_REFILL_RD"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_REFILL_WR"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_WB_VICTIM"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_WB_CLEAN"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_INVAL"
-    },
-    {
-        "ArchStdEvent": "L2D_TLB_REFILL_RD"
-    },
-    {
-        "ArchStdEvent": "L2D_TLB_REFILL_WR"
-    },
-    {
-        "ArchStdEvent": "L2D_TLB_RD"
-    },
-    {
-        "ArchStdEvent": "L2D_TLB_WR"
-    },
-    {
-        "ArchStdEvent": "L3D_CACHE_RD"
-    },
-    {
-        "ArchStdEvent": "L1I_CACHE_LMISS"
-    },
-    {
-        "ArchStdEvent": "L2D_CACHE_LMISS_RD"
-    },
-    {
-        "ArchStdEvent": "L3D_CACHE_LMISS_RD"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/exception.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/exception.json
deleted file mode 100644
index 344a2d552ad5..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/exception.json
+++ /dev/null
@@ -1,47 +0,0 @@
-[
-    {
-        "ArchStdEvent": "EXC_TAKEN"
-    },
-    {
-        "ArchStdEvent": "MEMORY_ERROR"
-    },
-    {
-        "ArchStdEvent": "EXC_UNDEF"
-    },
-    {
-        "ArchStdEvent": "EXC_SVC"
-    },
-    {
-        "ArchStdEvent": "EXC_PABORT"
-    },
-    {
-        "ArchStdEvent": "EXC_DABORT"
-    },
-    {
-        "ArchStdEvent": "EXC_IRQ"
-    },
-    {
-        "ArchStdEvent": "EXC_FIQ"
-    },
-    {
-        "ArchStdEvent": "EXC_SMC"
-    },
-    {
-        "ArchStdEvent": "EXC_HVC"
-    },
-    {
-        "ArchStdEvent": "EXC_TRAP_PABORT"
-    },
-    {
-        "ArchStdEvent": "EXC_TRAP_DABORT"
-    },
-    {
-        "ArchStdEvent": "EXC_TRAP_OTHER"
-    },
-    {
-        "ArchStdEvent": "EXC_TRAP_IRQ"
-    },
-    {
-        "ArchStdEvent": "EXC_TRAP_FIQ"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json
deleted file mode 100644
index e57cd55937c6..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json
+++ /dev/null
@@ -1,143 +0,0 @@
-[
-    {
-        "ArchStdEvent": "SW_INCR"
-    },
-    {
-        "ArchStdEvent": "INST_RETIRED"
-    },
-    {
-        "ArchStdEvent": "EXC_RETURN"
-    },
-    {
-        "ArchStdEvent": "CID_WRITE_RETIRED"
-    },
-    {
-        "ArchStdEvent": "INST_SPEC"
-    },
-    {
-        "ArchStdEvent": "TTBR_WRITE_RETIRED"
-    },
-    {
-        "ArchStdEvent": "BR_RETIRED"
-    },
-    {
-        "ArchStdEvent": "BR_MIS_PRED_RETIRED"
-    },
-    {
-        "ArchStdEvent": "OP_RETIRED"
-    },
-    {
-        "ArchStdEvent": "OP_SPEC"
-    },
-    {
-        "ArchStdEvent": "LDREX_SPEC"
-    },
-    {
-        "ArchStdEvent": "STREX_PASS_SPEC"
-    },
-    {
-        "ArchStdEvent": "STREX_FAIL_SPEC"
-    },
-    {
-        "ArchStdEvent": "STREX_SPEC"
-    },
-    {
-        "ArchStdEvent": "LD_SPEC"
-    },
-    {
-        "ArchStdEvent": "ST_SPEC"
-    },
-    {
-        "ArchStdEvent": "DP_SPEC"
-    },
-    {
-        "ArchStdEvent": "ASE_SPEC"
-    },
-    {
-        "ArchStdEvent": "VFP_SPEC"
-    },
-    {
-        "ArchStdEvent": "PC_WRITE_SPEC"
-    },
-    {
-        "ArchStdEvent": "CRYPTO_SPEC"
-    },
-    {
-        "ArchStdEvent": "BR_IMMED_SPEC"
-    },
-    {
-        "ArchStdEvent": "BR_RETURN_SPEC"
-    },
-    {
-        "ArchStdEvent": "BR_INDIRECT_SPEC"
-    },
-    {
-        "ArchStdEvent": "ISB_SPEC"
-    },
-    {
-        "ArchStdEvent": "DSB_SPEC"
-    },
-    {
-        "ArchStdEvent": "DMB_SPEC"
-    },
-    {
-        "ArchStdEvent": "RC_LD_SPEC"
-    },
-    {
-        "ArchStdEvent": "RC_ST_SPEC"
-    },
-    {
-        "ArchStdEvent": "ASE_INST_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_INST_SPEC"
-    },
-    {
-        "ArchStdEvent": "FP_HP_SPEC"
-    },
-    {
-        "ArchStdEvent": "FP_SP_SPEC"
-    },
-    {
-        "ArchStdEvent": "FP_DP_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_PRED_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_PRED_EMPTY_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_PRED_FULL_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_LDFF_SPEC"
-    },
-    {
-        "ArchStdEvent": "SVE_LDFF_FAULT_SPEC"
-    },
-    {
-        "ArchStdEvent": "FP_SCALE_OPS_SPEC"
-    },
-    {
-        "ArchStdEvent": "FP_FIXED_OPS_SPEC"
-    },
-    {
-        "ArchStdEvent": "ASE_SVE_INT8_SPEC"
-    },
-    {
-        "ArchStdEvent": "ASE_SVE_INT16_SPEC"
-    },
-    {
-        "ArchStdEvent": "ASE_SVE_INT32_SPEC"
-    },
-    {
-        "ArchStdEvent": "ASE_SVE_INT64_SPEC"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/memory.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/memory.json
deleted file mode 100644
index 7b2b21ac150f..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/memory.json
+++ /dev/null
@@ -1,41 +0,0 @@
-[
-    {
-        "ArchStdEvent": "MEM_ACCESS"
-    },
-    {
-        "ArchStdEvent": "REMOTE_ACCESS"
-    },
-    {
-        "ArchStdEvent": "MEM_ACCESS_RD"
-    },
-    {
-        "ArchStdEvent": "MEM_ACCESS_WR"
-    },
-    {
-        "ArchStdEvent": "UNALIGNED_LD_SPEC"
-    },
-    {
-        "ArchStdEvent": "UNALIGNED_ST_SPEC"
-    },
-    {
-        "ArchStdEvent": "UNALIGNED_LDST_SPEC"
-    },
-    {
-        "ArchStdEvent": "LDST_ALIGN_LAT"
-    },
-    {
-        "ArchStdEvent": "LD_ALIGN_LAT"
-    },
-    {
-        "ArchStdEvent": "ST_ALIGN_LAT"
-    },
-    {
-        "ArchStdEvent": "MEM_ACCESS_CHECKED"
-    },
-    {
-        "ArchStdEvent": "MEM_ACCESS_CHECKED_RD"
-    },
-    {
-        "ArchStdEvent": "MEM_ACCESS_CHECKED_WR"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
deleted file mode 100644
index 8ad15b726dca..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ /dev/null
@@ -1,273 +0,0 @@
-[
-    {
-        "ArchStdEvent": "FRONTEND_BOUND",
-        "MetricExpr": "((stall_slot_frontend) if (#slots - 5) else (stall_slot_frontend - cpu_cycles)) / (#slots * cpu_cycles)"
-    },
-    {
-        "ArchStdEvent": "BAD_SPECULATION",
-        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
-    },
-    {
-        "ArchStdEvent": "RETIRING",
-        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
-    },
-    {
-        "ArchStdEvent": "BACKEND_BOUND"
-    },
-    {
-        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
-        "BriefDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups",
-        "MetricGroup": "TLB",
-        "MetricName": "l1d_tlb_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
-        "BriefDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups",
-        "MetricGroup": "TLB",
-        "MetricName": "l1i_tlb_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
-        "BriefDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups",
-        "MetricGroup": "TLB",
-        "MetricName": "l2_tlb_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
-        "MetricGroup": "TLB",
-        "MetricName": "dtlb_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "DTLB_WALK / L1D_TLB",
-        "BriefDescription": "The rate of DTLB Walks to the overall L1D TLB lookups",
-        "MetricGroup": "TLB",
-        "MetricName": "dtlb_walk_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
-        "MetricGroup": "TLB",
-        "MetricName": "itlb_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "ITLB_WALK / L1I_TLB",
-        "BriefDescription": "The rate of ITLB Walks to the overall L1I TLB lookups",
-        "MetricGroup": "TLB",
-        "MetricName": "itlb_walk_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
-        "MetricGroup": "Cache",
-        "MetricName": "l1i_cache_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
-        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
-        "MetricGroup": "Cache",
-        "MetricName": "l1i_cache_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
-        "MetricGroup": "Cache",
-        "MetricName": "l1d_cache_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
-        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
-        "MetricGroup": "Cache",
-        "MetricName": "l1d_cache_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
-        "MetricGroup": "Cache",
-        "MetricName": "l2d_cache_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
-        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
-        "MetricGroup": "Cache",
-        "MetricName": "l2d_cache_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
-        "MetricGroup": "Cache",
-        "MetricName": "l3d_cache_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
-        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
-        "MetricGroup": "Cache",
-        "MetricName": "l3d_cache_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
-        "MetricGroup": "Cache",
-        "MetricName": "ll_cache_read_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
-        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
-        "MetricGroup": "Cache",
-        "MetricName": "ll_cache_read_miss_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
-        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
-        "MetricGroup": "Cache",
-        "MetricName": "ll_cache_read_hit_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
-        "MetricGroup": "Branch",
-        "MetricName": "branch_mpki",
-        "ScaleUnit": "1MPKI"
-    },
-    {
-        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
-        "BriefDescription": "The rate of branches retired per kilo instructions",
-        "MetricGroup": "Branch",
-        "MetricName": "branch_pki",
-        "ScaleUnit": "1PKI"
-    },
-    {
-        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
-        "BriefDescription": "The rate of branches mis-predited to the overall branches",
-        "MetricGroup": "Branch",
-        "MetricName": "branch_miss_pred_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "instructions / CPU_CYCLES",
-        "BriefDescription": "The average number of instructions executed for each cycle.",
-        "MetricGroup": "PEutilization",
-        "MetricName": "ipc"
-    },
-    {
-        "MetricExpr": "ipc / 5",
-        "BriefDescription": "IPC percentage of peak. The peak of IPC is 5.",
-        "MetricGroup": "PEutilization",
-        "MetricName": "ipc_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
-        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
-        "MetricGroup": "PEutilization",
-        "MetricName": "retired_ipc"
-    },
-    {
-        "MetricExpr": "INST_SPEC / CPU_CYCLES",
-        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
-        "MetricGroup": "PEutilization",
-        "MetricName": "spec_ipc"
-    },
-    {
-        "MetricExpr": "OP_RETIRED / OP_SPEC",
-        "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)",
-        "MetricGroup": "PEutilization",
-        "MetricName": "retired_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
-        "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)",
-        "MetricGroup": "PEutilization",
-        "MetricName": "wasted_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT if (#slots - 5) else (STALL_SLOT - CPU_CYCLES)) / (#slots * CPU_CYCLES))",
-        "BriefDescription": "The truly effective ratio of micro-operations executed by the CPU, which means that misprediction and stall are not included",
-        "MetricGroup": "PEutilization",
-        "MetricName": "cpu_utilization",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "LD_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "load_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "ST_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "store_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "DP_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "data_process_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "ASE_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "advanced_simd_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "VFP_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "float_point_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "crypto_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "branch_immed_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "branch_return_spec_rate",
-        "ScaleUnit": "100%"
-    },
-    {
-        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
-        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
-        "MetricGroup": "InstructionMix",
-        "MetricName": "branch_indirect_spec_rate",
-        "ScaleUnit": "100%"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json
deleted file mode 100644
index f9fae15f7555..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json
+++ /dev/null
@@ -1,23 +0,0 @@
-[
-    {
-        "ArchStdEvent": "STALL_FRONTEND"
-    },
-    {
-        "ArchStdEvent": "STALL_BACKEND"
-    },
-    {
-        "ArchStdEvent": "STALL"
-    },
-    {
-        "ArchStdEvent": "STALL_SLOT_BACKEND"
-    },
-    {
-        "ArchStdEvent": "STALL_SLOT_FRONTEND"
-    },
-    {
-        "ArchStdEvent": "STALL_SLOT"
-    },
-    {
-        "ArchStdEvent": "STALL_BACKEND_MEM"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spe.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spe.json
deleted file mode 100644
index 20f2165c85fe..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spe.json
+++ /dev/null
@@ -1,14 +0,0 @@
-[
-    {
-        "ArchStdEvent": "SAMPLE_POP"
-    },
-    {
-        "ArchStdEvent": "SAMPLE_FEED"
-    },
-    {
-        "ArchStdEvent": "SAMPLE_FILTRATE"
-    },
-    {
-        "ArchStdEvent": "SAMPLE_COLLISION"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/trace.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/trace.json
deleted file mode 100644
index 3116135c59e2..000000000000
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/trace.json
+++ /dev/null
@@ -1,29 +0,0 @@
-[
-    {
-        "ArchStdEvent": "TRB_WRAP"
-    },
-    {
-        "ArchStdEvent": "TRCEXTOUT0"
-    },
-    {
-        "ArchStdEvent": "TRCEXTOUT1"
-    },
-    {
-        "ArchStdEvent": "TRCEXTOUT2"
-    },
-    {
-        "ArchStdEvent": "TRCEXTOUT3"
-    },
-    {
-        "ArchStdEvent": "CTI_TRIGOUT4"
-    },
-    {
-        "ArchStdEvent": "CTI_TRIGOUT5"
-    },
-    {
-        "ArchStdEvent": "CTI_TRIGOUT6"
-    },
-    {
-        "ArchStdEvent": "CTI_TRIGOUT7"
-    }
-]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/bus.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/bus.json
new file mode 100644
index 000000000000..2e11a8c4a484
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/bus.json
@@ -0,0 +1,18 @@
+[
+    {
+        "ArchStdEvent": "BUS_ACCESS",
+        "PublicDescription": "Counts memory transactions issued by the CPU to the external bus, including snoop requests and snoop responses. Each beat of data is counted individually."
+    },
+    {
+        "ArchStdEvent": "BUS_CYCLES",
+        "PublicDescription": "Counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event is a duplicate of CPU_CYCLES."
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_RD",
+        "PublicDescription": "Counts memory read transactions seen on the external bus. Each beat of data is counted individually."
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_WR",
+        "PublicDescription": "Counts memory write transactions seen on the external bus. Each beat of data is counted individually."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/exception.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/exception.json
new file mode 100644
index 000000000000..4404b8e91690
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/exception.json
@@ -0,0 +1,62 @@
+[
+    {
+        "ArchStdEvent": "EXC_TAKEN",
+        "PublicDescription": "Counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_RETURN",
+        "PublicDescription": "Counts any architecturally executed exception return instructions. Eg: AArch64: ERET"
+    },
+    {
+        "ArchStdEvent": "EXC_UNDEF",
+        "PublicDescription": "Counts the number of synchronous exceptions which are taken locally that are due to attempting to execute an instruction that is UNDEFINED. Attempting to execute instruction bit patterns that have not been allocated. Attempting to execute instructions when they are disabled. Attempting to execute instructions at an inappropriate Exception level. Attempting to execute an instruction when the value of PSTATE.IL is 1."
+    },
+    {
+        "ArchStdEvent": "EXC_SVC",
+        "PublicDescription": "Counts SVC exceptions taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_PABORT",
+        "PublicDescription": "Counts synchronous exceptions that are taken locally and caused by Instruction Aborts."
+    },
+    {
+        "ArchStdEvent": "EXC_DABORT",
+        "PublicDescription": "Counts exceptions that are taken locally and are caused by data aborts or SErrors. Conditions that could cause those exceptions are attempting to read or write memory where the MMU generates a fault, attempting to read or write memory with a misaligned address, interrupts from the nSEI inputs and internally generated SErrors."
+    },
+    {
+        "ArchStdEvent": "EXC_IRQ",
+        "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_FIQ",
+        "PublicDescription": "Counts FIQ exceptions including the virtual FIQs that are taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_SMC",
+        "PublicDescription": "Counts SMC exceptions take to EL3."
+    },
+    {
+        "ArchStdEvent": "EXC_HVC",
+        "PublicDescription": "Counts HVC exceptions taken to EL2."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_PABORT",
+        "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Instruction Aborts. For example, attempting to execute an instruction with a misaligned PC."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_DABORT",
+        "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Data Aborts or SError interrupts. Conditions that could cause those exceptions are:\n\n1. Attempting to read or write memory where the MMU generates a fault,\n2. Attempting to read or write memory with a misaligned address,\n3. Interrupts from the SEI input.\n4. internally generated SErrors."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_OTHER",
+        "PublicDescription": "Counts the number of synchronous trap exceptions which are not taken locally and are not SVC, SMC, HVC, data aborts, Instruction Aborts, or interrupts."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_IRQ",
+        "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are not taken locally."
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_FIQ",
+        "PublicDescription": "Counts FIQs which are not taken locally but taken from EL0, EL1,\n or EL2 to EL3 (which would be the normal behavior for FIQs when not executing\n in EL3)."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/fp_operation.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/fp_operation.json
new file mode 100644
index 000000000000..cec3435ac766
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/fp_operation.json
@@ -0,0 +1,22 @@
+[
+    {
+        "ArchStdEvent": "FP_HP_SPEC",
+        "PublicDescription": "Counts speculatively executed half precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_SP_SPEC",
+        "PublicDescription": "Counts speculatively executed single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_DP_SPEC",
+        "PublicDescription": "Counts speculatively executed double precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_SCALE_OPS_SPEC",
+        "PublicDescription": "Counts speculatively executed scalable single precision floating point operations."
+    },
+    {
+        "ArchStdEvent": "FP_FIXED_OPS_SPEC",
+        "PublicDescription": "Counts speculatively executed non-scalable single precision floating point operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/general.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/general.json
new file mode 100644
index 000000000000..428810f855b8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/general.json
@@ -0,0 +1,10 @@
+[
+    {
+        "ArchStdEvent": "CPU_CYCLES",
+        "PublicDescription": "Counts CPU clock cycles (not timer cycles). The clock measured by this event is defined as the physical clock driving the CPU logic."
+    },
+    {
+        "ArchStdEvent": "CNT_CYCLES",
+        "PublicDescription": "Counts constant frequency cycles"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1d_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1d_cache.json
new file mode 100644
index 000000000000..ed83e1c5affe
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1d_cache.json
@@ -0,0 +1,54 @@
+[
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL",
+        "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load or store operations that missed in the level 1 data cache. This event only counts one event per cache line. This event does not count cache line allocations from preload instructions or from hardware cache prefetching."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE",
+        "PublicDescription": "Counts level 1 data cache accesses from any load/store operations. Atomic operations that resolve in the CPUs caches (near atomic operations) counts as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other level 1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB",
+        "PublicDescription": "Counts write-backs of dirty data from the L1 data cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 data cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n\n1. Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n2. Full line writes which write to L2 without writing L1, such as write streaming mode."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_LMISS_RD",
+        "PublicDescription": "Counts cache line refills into the level 1 data cache from any memory read operations, that incurred additional latency."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_RD",
+        "PublicDescription": "Counts level 1 data cache accesses from any load operation. Atomic load operations that resolve in the CPUs caches counts as both a write access and read access."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WR",
+        "PublicDescription": "Counts level 1 data cache accesses generated by store operations. This event also counts accesses caused by a DC ZVA (data cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPUs caches count as a write access and read access."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_RD",
+        "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load instructions where the memory read operation misses in the level 1 data cache. This event only counts one event per cache line."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_WR",
+        "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed store instructions where the memory write operation misses in the level 1 data cache. This event only counts one event per cache line."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
+        "PublicDescription": "Counts level 1 data cache refills where the cache line data came from caches inside the immediate cluster of the core."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
+        "PublicDescription": "Counts level 1 data cache refills for which the cache line data came from outside the immediate cluster of the core, like an SLC in the system interconnect or DRAM."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_VICTIM",
+        "PublicDescription": "Counts dirty cache line evictions from the level 1 data cache caused by a new cache line allocation. This event does not count evictions caused by cache maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_CLEAN",
+        "PublicDescription": "Counts write-backs from the level 1 data cache that are a result of a coherency operation made by another CPU. Event count includes cache maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_INVAL",
+        "PublicDescription": "Counts each explicit invalidation of a cache line in the level 1 data cache caused by:\n\n- Cache Maintenance Operations (CMO) that operate by a virtual address.\n- Broadcast cache coherency operations from another CPU in the system.\n\nThis event does not count for the following conditions:\n\n1. A cache refill invalidates a cache line.\n2. A CMO which is executed on that CPU and invalidates a cache line specified by set/way.\n\nNote that CMOs that operate by set/way cannot be broadcast from one CPU to another."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1i_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1i_cache.json
new file mode 100644
index 000000000000..633f1030359d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l1i_cache.json
@@ -0,0 +1,14 @@
+[
+    {
+        "ArchStdEvent": "L1I_CACHE_REFILL",
+        "PublicDescription": "Counts cache line refills in the level 1 instruction cache caused by a missed instruction fetch. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE",
+        "PublicDescription": "Counts instruction fetches which access the level 1 instruction cache. Instruction cache accesses caused by cache maintenance operations are not counted."
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_LMISS",
+        "PublicDescription": "Counts cache line refills into the level 1 instruction cache, that incurred additional latency."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l2_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l2_cache.json
new file mode 100644
index 000000000000..0e31d0daf88b
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l2_cache.json
@@ -0,0 +1,50 @@
+[
+    {
+        "ArchStdEvent": "L2D_CACHE",
+        "PublicDescription": "Counts level 2 cache accesses. level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the first level caches or translation resolutions due to accesses. This event also counts write back of dirty data from level 1 data cache to the L2 cache."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL",
+        "PublicDescription": "Counts cache line refills into the level 2 cache. level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB",
+        "PublicDescription": "Counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_ALLOCATE",
+        "PublicDescription": "TBD"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_RD",
+        "PublicDescription": "Counts level 2 cache accesses due to memory read operations. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WR",
+        "PublicDescription": "Counts level 2 cache accesses due to memory write operations. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_RD",
+        "PublicDescription": "Counts refills for memory accesses due to memory read operation counted by L2D_CACHE_RD. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_WR",
+        "PublicDescription": "Counts refills for memory accesses due to memory write operation counted by L2D_CACHE_WR. level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_VICTIM",
+        "PublicDescription": "Counts evictions from the level 2 cache because of a line being allocated into the L2 cache."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_CLEAN",
+        "PublicDescription": "Counts write-backs from the level 2 cache that are a result of either:\n\n1. Cache maintenance operations,\n\n2. Snoop responses or,\n\n3. Direct cache transfers to another CPU due to a forwarding snoop request."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_INVAL",
+        "PublicDescription": "Counts each explicit invalidation of a cache line in the level 2 cache by cache maintenance operations that operate by a virtual address, or by external coherency operations. This event does not count if either:\n\n1. A cache refill invalidates a cache line or,\n2. A Cache Maintenance Operation (CMO), which invalidates a cache line specified by set/way, is executed on that CPU.\n\nCMOs that operate by set/way cannot be broadcast from one CPU to another."
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_LMISS_RD",
+        "PublicDescription": "Counts cache line refills into the level 2 unified cache from any memory read operations that incurred additional latency."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l3_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l3_cache.json
new file mode 100644
index 000000000000..45bfba532df7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/l3_cache.json
@@ -0,0 +1,22 @@
+[
+    {
+        "ArchStdEvent": "L3D_CACHE_ALLOCATE",
+        "PublicDescription": "Counts level 3 cache line allocates that do not fetch data from outside the level 3 data or unified cache. For example, allocates due to streaming stores."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL",
+        "PublicDescription": "Counts level 3 accesses that receive data from outside the L3 cache."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE",
+        "PublicDescription": "Counts level 3 cache accesses. level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_RD",
+        "PublicDescription": "TBD"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_LMISS_RD",
+        "PublicDescription": "Counts any cache line refill into the level 3 cache from memory read operations that incurred additional latency."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/ll_cache.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/ll_cache.json
new file mode 100644
index 000000000000..bb712d57d58a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/ll_cache.json
@@ -0,0 +1,10 @@
+[
+    {
+        "ArchStdEvent": "LL_CACHE_RD",
+        "PublicDescription": "Counts read transactions that were returned from outside the core cluster. This event counts when the system register CPUECTLR.EXTLLC bit is set. This event counts read transactions returned from outside the core if those transactions are either hit in the system level cache or missed in the SLC and are returned from any other external sources."
+    },
+    {
+        "ArchStdEvent": "LL_CACHE_MISS_RD",
+        "PublicDescription": "Counts read transactions that were returned from outside the core cluster but missed in the system level cache. This event counts when the system register CPUECTLR.EXTLLC bit is set. This event counts read transactions returned from outside the core if those transactions are missed in the System level Cache. The data source of the transaction is indicated by a field in the CHI transaction returning to the CPU. This event does not count reads caused by cache maintenance operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/memory.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/memory.json
new file mode 100644
index 000000000000..106a97f8b2e7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/memory.json
@@ -0,0 +1,46 @@
+[
+    {
+        "ArchStdEvent": "MEM_ACCESS",
+        "PublicDescription": "Counts memory accesses issued by the CPU load store unit, where those accesses are issued due to load or store operations. This event counts memory accesses no matter whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions."
+    },
+    {
+        "ArchStdEvent": "MEMORY_ERROR",
+        "PublicDescription": "Counts any detected correctable or uncorrectable physical memory errors (ECC or parity) in protected CPUs RAMs. On the core, this event counts errors in the caches (including data and tag rams). Any detected memory error (from either a speculative and abandoned access, or an architecturally executed access) is counted. Note that errors are only detected when the actual protected memory is accessed by an operation."
+    },
+    {
+        "ArchStdEvent": "REMOTE_ACCESS",
+        "PublicDescription": "Counts accesses to another chip, which is implemented as a different CMN mesh in the system. If the CHI bus response back to the core indicates that the data source is from another chip (mesh), then the counter is updated. If no data is returned, even if the system snoops another chip/mesh, then the counter is not updated."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_RD",
+        "PublicDescription": "Counts memory accesses issued by the CPU due to load operations. The event counts any memory load access, no matter whether the data is received from any level of cache hierarchy or external memory. The event also counts atomic load operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_WR",
+        "PublicDescription": "Counts memory accesses issued by the CPU due to store operations. The event counts any memory store access, no matter whether the data is located in any level of cache or external memory. The event also counts atomic load and store operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+    },
+    {
+        "ArchStdEvent": "LDST_ALIGN_LAT",
+        "PublicDescription": "Counts the number of memory read and write accesses in a cycle that incurred additional latency, due to the alignment of the address and the size of data being accessed, which results in store crossing a single cache line."
+    },
+    {
+        "ArchStdEvent": "LD_ALIGN_LAT",
+        "PublicDescription": "Counts the number of memory read accesses in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed, which results in load crossing a single cache line."
+    },
+    {
+        "ArchStdEvent": "ST_ALIGN_LAT",
+        "PublicDescription": "Counts the number of memory write access in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed incurred additional latency."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_CHECKED",
+        "PublicDescription": "Counts the number of memory read and write accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_CHECKED_RD",
+        "PublicDescription": "Counts the number of memory read accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_CHECKED_WR",
+        "PublicDescription": "Counts the number of memory write accesses in a cycle that is tag checked by the Memory Tagging Extension (MTE)."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/metrics.json
new file mode 100644
index 000000000000..8f1479b1bb0d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/metrics.json
@@ -0,0 +1,332 @@
+[
+    {
+        "ArchStdEvent": "backend_bound",
+        "MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * 5)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
+    },
+    {
+        "MetricName": "backend_stalled_cycles",
+        "MetricExpr": "((STALL_BACKEND / CPU_CYCLES) * 100)",
+        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
+        "MetricGroup": "Cycle_Accounting",
+        "ScaleUnit": "1percent of cycles"
+    },
+    {
+        "ArchStdEvent": "bad_speculation",
+        "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 - ((STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
+    },
+    {
+        "MetricName": "branch_misprediction_ratio",
+        "MetricExpr": "(BR_MIS_PRED_RETIRED / BR_RETIRED)",
+        "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
+        "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
+        "ScaleUnit": "1per branch"
+    },
+    {
+        "MetricName": "branch_mpki",
+        "MetricExpr": "((BR_MIS_PRED_RETIRED / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
+        "MetricGroup": "MPKI;Branch_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "branch_percentage",
+        "MetricExpr": "(((BR_IMMED_SPEC + BR_INDIRECT_SPEC) / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures branch operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "crypto_percentage",
+        "MetricExpr": "((CRYPTO_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "dtlb_mpki",
+        "MetricExpr": "((DTLB_WALK / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of data TLB Walks per thousand instructions executed.",
+        "MetricGroup": "MPKI;DTLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "dtlb_walk_ratio",
+        "MetricExpr": "(DTLB_WALK / L1D_TLB)",
+        "BriefDescription": "This metric measures the ratio of data TLB Walks to the total number of data TLB accesses. This gives an indication of the effectiveness of the data TLB accesses.",
+        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "ArchStdEvent": "frontend_bound",
+        "MetricExpr": "(100 * (((STALL_SLOT_FRONTEND - CPU_CYCLES) / (5 * CPU_CYCLES)) - (BR_MIS_PRED / CPU_CYCLES)))"
+    },
+    {
+        "MetricName": "frontend_stalled_cycles",
+        "MetricExpr": "((STALL_FRONTEND / CPU_CYCLES) * 100)",
+        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
+        "MetricGroup": "Cycle_Accounting",
+        "ScaleUnit": "1percent of cycles"
+    },
+    {
+        "MetricName": "integer_dp_percentage",
+        "MetricExpr": "((DP_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "ipc",
+        "MetricExpr": "(INST_RETIRED / CPU_CYCLES)",
+        "BriefDescription": "This metric measures the number of instructions retired per cycle.",
+        "MetricGroup": "General",
+        "ScaleUnit": "1per cycle"
+    },
+    {
+        "MetricName": "itlb_mpki",
+        "MetricExpr": "((ITLB_WALK / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
+        "MetricGroup": "MPKI;ITLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "itlb_walk_ratio",
+        "MetricExpr": "(ITLB_WALK / L1I_TLB)",
+        "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of instruction TLB accesses. This gives an indication of the effectiveness of the instruction TLB accesses.",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l1d_cache_miss_ratio",
+        "MetricExpr": "(L1D_CACHE_REFILL / L1D_CACHE)",
+        "BriefDescription": "This metric measures the ratio of level 1 data cache accesses missed to the total number of level 1 data cache accesses. This gives an indication of the effectiveness of the level 1 data cache.",
+        "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "l1d_cache_mpki",
+        "MetricExpr": "((L1D_CACHE_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 data cache accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;L1D_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l1d_tlb_miss_ratio",
+        "MetricExpr": "(L1D_TLB_REFILL / L1D_TLB)",
+        "BriefDescription": "This metric measures the ratio of level 1 data TLB accesses missed to the total number of level 1 data TLB accesses. This gives an indication of the effectiveness of the level 1 data TLB.",
+        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l1d_tlb_mpki",
+        "MetricExpr": "((L1D_TLB_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;DTLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l1i_cache_miss_ratio",
+        "MetricExpr": "(L1I_CACHE_REFILL / L1I_CACHE)",
+        "BriefDescription": "This metric measures the ratio of level 1 instruction cache accesses missed to the total number of level 1 instruction cache accesses. This gives an indication of the effectiveness of the level 1 instruction cache.",
+        "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "l1i_cache_mpki",
+        "MetricExpr": "((L1I_CACHE_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 instruction cache accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;L1I_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l1i_tlb_miss_ratio",
+        "MetricExpr": "(L1I_TLB_REFILL / L1I_TLB)",
+        "BriefDescription": "This metric measures the ratio of level 1 instruction TLB accesses missed to the total number of level 1 instruction TLB accesses. This gives an indication of the effectiveness of the level 1 instruction TLB.",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l1i_tlb_mpki",
+        "MetricExpr": "((L1I_TLB_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;ITLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l2_cache_miss_ratio",
+        "MetricExpr": "(L2D_CACHE_REFILL / L2D_CACHE)",
+        "BriefDescription": "This metric measures the ratio of level 2 cache accesses missed to the total number of level 2 cache accesses. This gives an indication of the effectiveness of the level 2 cache, which is a unified cache that stores both data and instruction. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+        "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "l2_cache_mpki",
+        "MetricExpr": "((L2D_CACHE_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 2 unified cache accesses missed per thousand instructions executed. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+        "MetricGroup": "MPKI;L2_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "l2_tlb_miss_ratio",
+        "MetricExpr": "(L2D_TLB_REFILL / L2D_TLB)",
+        "BriefDescription": "This metric measures the ratio of level 2 unified TLB accesses missed to the total number of level 2 unified TLB accesses. This gives an indication of the effectiveness of the level 2 TLB.",
+        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness",
+        "ScaleUnit": "1per TLB access"
+    },
+    {
+        "MetricName": "l2_tlb_mpki",
+        "MetricExpr": "((L2D_TLB_REFILL / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of level 2 unified TLB accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "ll_cache_read_hit_ratio",
+        "MetricExpr": "((LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD)",
+        "BriefDescription": "This metric measures the ratio of last level cache read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+        "MetricGroup": "LL_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "ll_cache_read_miss_ratio",
+        "MetricExpr": "(LL_CACHE_MISS_RD / LL_CACHE_RD)",
+        "BriefDescription": "This metric measures the ratio of last level cache read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+        "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness",
+        "ScaleUnit": "1per cache access"
+    },
+    {
+        "MetricName": "ll_cache_read_mpki",
+        "MetricExpr": "((LL_CACHE_MISS_RD / INST_RETIRED) * 1000)",
+        "BriefDescription": "This metric measures the number of last level cache read accesses missed per thousand instructions executed.",
+        "MetricGroup": "MPKI;LL_Cache_Effectiveness",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricName": "load_percentage",
+        "MetricExpr": "((LD_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures load operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "ArchStdEvent": "retiring",
+        "MetricExpr": "(100 * ((OP_RETIRED / OP_SPEC) * (1 - ((STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5)))))"
+    },
+    {
+        "MetricName": "scalar_fp_percentage",
+        "MetricExpr": "((VFP_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "simd_percentage",
+        "MetricExpr": "((ASE_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricName": "store_percentage",
+        "MetricExpr": "((ST_SPEC / INST_SPEC) * 100)",
+        "BriefDescription": "This metric measures store operations as a percentage of operations speculatively executed.",
+        "MetricGroup": "Operation_Mix",
+        "ScaleUnit": "1percent of operations"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "MPKI;L3_Cache_Effectiveness",
+        "MetricName": "l3d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Miss_Ratio;L3_Cache_Effectiveness",
+        "MetricName": "l3d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "MPKI;Branch_Effectiveness",
+        "MetricName": "branch_pki",
+        "ScaleUnit": "1PKI"
+    },
+    {
+        "MetricExpr": "ipc / 5",
+        "BriefDescription": "IPC percentage of peak. The peak of IPC is the number of slots.",
+        "MetricGroup": "General",
+        "MetricName": "ipc_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "INST_SPEC / CPU_CYCLES",
+        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "General",
+        "MetricName": "spec_ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)",
+        "MetricGroup": "General",
+        "MetricName": "retired_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)",
+        "MetricGroup": "General",
+        "MetricName": "wasted_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "load_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "store_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "advanced_simd_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "float_point_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "branch_immed_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "branch_return_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "Operation_Mix",
+        "MetricName": "branch_indirect_spec_rate",
+        "ScaleUnit": "100%"
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/retired.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/retired.json
new file mode 100644
index 000000000000..f297b049b62f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/retired.json
@@ -0,0 +1,30 @@
+[
+    {
+        "ArchStdEvent": "SW_INCR",
+        "PublicDescription": "Counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\n\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a read/increment/write sequence to the PMSWINC_EL0 register."
+    },
+    {
+        "ArchStdEvent": "INST_RETIRED",
+        "PublicDescription": "Counts instructions that have been architecturally executed."
+    },
+    {
+        "ArchStdEvent": "CID_WRITE_RETIRED",
+        "PublicDescription": "Counts architecturally executed writes to the CONTEXTIDR register, which usually contain the kernel PID and can be output with hardware trace."
+    },
+    {
+        "ArchStdEvent": "TTBR_WRITE_RETIRED",
+        "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
+    },
+    {
+        "ArchStdEvent": "BR_RETIRED",
+        "PublicDescription": "Counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted."
+    },
+    {
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+        "PublicDescription": "Counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
+    },
+    {
+        "ArchStdEvent": "OP_RETIRED",
+        "PublicDescription": "Counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spe.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spe.json
new file mode 100644
index 000000000000..5de8b0f3a440
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spe.json
@@ -0,0 +1,18 @@
+[
+    {
+        "ArchStdEvent": "SAMPLE_POP",
+        "PublicDescription": "Counts statistical profiling sample population, the count of all operations that could be sampled but may or may not be chosen for sampling."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FEED",
+        "PublicDescription": "Counts statistical profiling samples taken for sampling."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_FILTRATE",
+        "PublicDescription": "Counts statistical profiling samples taken which are not removed by filtering."
+    },
+    {
+        "ArchStdEvent": "SAMPLE_COLLISION",
+        "PublicDescription": "Counts statistical profiling samples that have collided with a previous sample and so therefore not taken."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spec_operation.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spec_operation.json
new file mode 100644
index 000000000000..1af961f8a6c8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/spec_operation.json
@@ -0,0 +1,110 @@
+[
+    {
+        "ArchStdEvent": "BR_MIS_PRED",
+        "PublicDescription": "Counts branches which are speculatively executed and mispredicted."
+    },
+    {
+        "ArchStdEvent": "BR_PRED",
+        "PublicDescription": "Counts branches speculatively executed and were predicted right."
+    },
+    {
+        "ArchStdEvent": "INST_SPEC",
+        "PublicDescription": "Counts operations that have been speculatively executed."
+    },
+    {
+        "ArchStdEvent": "OP_SPEC",
+        "PublicDescription": "Counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LD_SPEC",
+        "PublicDescription": "Counts unaligned memory read operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses. The event does not count preload operations (PLD, PLI)."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_ST_SPEC",
+        "PublicDescription": "Counts unaligned memory write operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses."
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LDST_SPEC",
+        "PublicDescription": "Counts unaligned memory operations issued by the CPU. This event counts unaligned accesses (as defined by the actual instruction), even if they are subsequently issued as multiple aligned accesses."
+    },
+    {
+        "ArchStdEvent": "LDREX_SPEC",
+        "PublicDescription": "Counts Load-Exclusive operations that have been speculatively executed. Eg: LDREX, LDX"
+    },
+    {
+        "ArchStdEvent": "STREX_PASS_SPEC",
+        "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have successfully completed the store operation."
+    },
+    {
+        "ArchStdEvent": "STREX_FAIL_SPEC",
+        "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have not successfully completed the store operation."
+    },
+    {
+        "ArchStdEvent": "STREX_SPEC",
+        "PublicDescription": "Counts store-exclusive operations that have been speculatively executed."
+    },
+    {
+        "ArchStdEvent": "LD_SPEC",
+        "PublicDescription": "Counts speculatively executed load operations including Single Instruction Multiple Data (SIMD) load operations."
+    },
+    {
+        "ArchStdEvent": "ST_SPEC",
+        "PublicDescription": "Counts speculatively executed store operations including Single Instruction Multiple Data (SIMD) store operations."
+    },
+    {
+        "ArchStdEvent": "DP_SPEC",
+        "PublicDescription": "Counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
+    },
+    {
+        "ArchStdEvent": "ASE_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD operations excluding load, store and move micro-operations that move data to or from SIMD (vector) registers."
+    },
+    {
+        "ArchStdEvent": "VFP_SPEC",
+        "PublicDescription": "Counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
+    },
+    {
+        "ArchStdEvent": "PC_WRITE_SPEC",
+        "PublicDescription": "Counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
+    },
+    {
+        "ArchStdEvent": "CRYPTO_SPEC",
+        "PublicDescription": "Counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_SPEC",
+        "PublicDescription": "Counts immediate branch operations which are speculatively executed."
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_SPEC",
+        "PublicDescription": "Counts procedure return operations (RET) which are speculatively executed."
+    },
+    {
+        "ArchStdEvent": "BR_INDIRECT_SPEC",
+        "PublicDescription": "Counts indirect branch operations including procedure returns, which are speculatively executed. This includes operations that force a software change of the PC, other than exception-generating operations.  Eg: BR Xn, RET"
+    },
+    {
+        "ArchStdEvent": "ISB_SPEC",
+        "PublicDescription": "Counts ISB operations that are executed."
+    },
+    {
+        "ArchStdEvent": "DSB_SPEC",
+        "PublicDescription": "Counts DSB operations that are speculatively issued to Load/Store unit in the CPU."
+    },
+    {
+        "ArchStdEvent": "DMB_SPEC",
+        "PublicDescription": "Counts DMB operations that are speculatively issued to the Load/Store unit in the CPU. This event does not count implied barriers from load acquire/store release operations."
+    },
+    {
+        "ArchStdEvent": "RC_LD_SPEC",
+        "PublicDescription": "Counts any load acquire operations that are speculatively executed. Eg: LDAR, LDARH, LDARB"
+    },
+    {
+        "ArchStdEvent": "RC_ST_SPEC",
+        "PublicDescription": "Counts any store release operations that are speculatively executed. Eg: STLR, STLRH, STLRB'"
+    },
+    {
+        "ArchStdEvent": "ASE_INST_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/stall.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/stall.json
new file mode 100644
index 000000000000..bbbebc805034
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/stall.json
@@ -0,0 +1,30 @@
+[
+    {
+        "ArchStdEvent": "STALL_FRONTEND",
+        "PublicDescription": "Counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. All the frontend slots were empty during the cycle when this event counts."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND",
+        "PublicDescription": "Counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
+    },
+    {
+        "ArchStdEvent": "STALL",
+        "PublicDescription": "Counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall)."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_BACKEND",
+        "PublicDescription": "Counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_FRONTEND",
+        "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints."
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT",
+        "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall)."
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_MEM",
+        "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the last level core cache."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/sve.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/sve.json
new file mode 100644
index 000000000000..51dab48cb2ba
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/sve.json
@@ -0,0 +1,50 @@
+[
+    {
+        "ArchStdEvent": "SVE_INST_SPEC",
+        "PublicDescription": "Counts speculatively executed operations that are SVE operations."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_EMPTY_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with no active predicate elements."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_FULL_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with all predicate elements active."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one but not all active predicate elements."
+    },
+    {
+        "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC",
+        "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one non active predicate elements."
+    },
+    {
+        "ArchStdEvent": "SVE_LDFF_SPEC",
+        "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations."
+    },
+    {
+        "ArchStdEvent": "SVE_LDFF_FAULT_SPEC",
+        "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations that clear at least one bit in the FFR."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT8_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type an 8-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT16_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT32_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
+    },
+    {
+        "ArchStdEvent": "ASE_SVE_INT64_SPEC",
+        "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/tlb.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/tlb.json
new file mode 100644
index 000000000000..b550af1831f5
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/tlb.json
@@ -0,0 +1,66 @@
+[
+    {
+        "ArchStdEvent": "L1I_TLB_REFILL",
+        "PublicDescription": "Counts level 1 instruction TLB refills from any Instruction fetch. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL",
+        "PublicDescription": "Counts level 1 data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT(address translation) instruction."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB",
+        "PublicDescription": "Counts level 1 data TLB accesses caused by any memory load or store operation. Note that load or store instructions can be broken up into multiple memory operations. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1I_TLB",
+        "PublicDescription": "Counts level 1 instruction TLB accesses, whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL",
+        "PublicDescription": "Counts level 2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB",
+        "PublicDescription": "Counts level 2 TLB accesses except those caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK",
+        "PublicDescription": "Counts data memory translation table walks caused by a miss in the L2 TLB driven by a memory access. Note that partial translations that also cause a table walk are counted. This event does not count table walks caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK",
+        "PublicDescription": "Counts instruction memory translation table walks caused by a miss in the L2 TLB driven by a memory access. Partial translations that also cause a table walk are counted. This event does not count table walks caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_RD",
+        "PublicDescription": "Counts level 1 data TLB refills caused by memory read operations. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an Address Translation (AT) instruction."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_WR",
+        "PublicDescription": "Counts level 1 data TLB refills caused by data side memory write operations. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count with an access from an Address Translation (AT) instruction."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_RD",
+        "PublicDescription": "Counts level 1 data TLB accesses caused by memory read operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_WR",
+        "PublicDescription": "Counts any L1 data side TLB accesses caused by memory write operations. This event counts whether the access hits or misses in the TLB. This event does not count TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_RD",
+        "PublicDescription": "Counts level 2 TLB refills caused by memory read operations from both data and instruction fetch except for those caused by TLB maintenance operations or hardware prefetches."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_WR",
+        "PublicDescription": "Counts level 2 TLB refills caused by memory write operations from both data and instruction fetch except for those caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_RD",
+        "PublicDescription": "Counts level 2 TLB accesses caused by memory read operations from both data and instruction fetch except for those caused by TLB maintenance operations."
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_WR",
+        "PublicDescription": "Counts level 2 TLB accesses caused by memory write operations from both data and instruction fetch except for those caused by TLB maintenance operations."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/trace.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/trace.json
new file mode 100644
index 000000000000..98f6fabfebc7
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p0/trace.json
@@ -0,0 +1,38 @@
+[
+    {
+        "ArchStdEvent": "TRB_WRAP",
+        "PublicDescription": "This event is generated each time the current write pointer is wrapped to the base pointer."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT0",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 0."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT1",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 1."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT2",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 2."
+    },
+    {
+        "ArchStdEvent": "TRCEXTOUT3",
+        "PublicDescription": "This event is generated each time an event is signaled by ETE external event 3."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT4",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 4."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT5",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 5."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT6",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 6."
+    },
+    {
+        "ArchStdEvent": "CTI_TRIGOUT7",
+        "PublicDescription": "This event is generated each time an event is signaled on CTI output trigger 7."
+    }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
index 13d027656c26..aa917f13bddb 100644
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@@ -34,7 +34,7 @@
 0x00000000410fd460,v1,arm/cortex-a510,core
 0x00000000410fd470,v1,arm/cortex-a710,core
 0x00000000410fd480,v1,arm/cortex-x2,core
-0x00000000410fd490,v1,arm/neoverse-n2-v2,core
+0x00000000410fd490,v1,arm/neoverse-n2r0p0,core
 0x00000000410fd493,v1,arm/neoverse-n2r0p3-v2,core
 0x00000000410fd4f0,v1,arm/neoverse-n2r0p3-v2,core
 0x00000000420f5160,v1,cavium/thunderx2,core
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo
  2023-07-10 14:19 ` [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo James Clark
@ 2023-07-10 14:28   ` James Clark
  2023-07-27  3:50     ` Jing Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-10 14:28 UTC (permalink / raw)
  To: renyu.zj; +Cc: namhyung, acme, John Garry, linux-perf-users, irogers



On 10/07/2023 15:19, James Clark wrote:
> The new metrics contain a fix for N2 r0p3 where CPU_CYCLES should not be
> subtracted from stalls for topdown metrics anymore. The current metrics
> assume that the fix should be applied anywhere where slots != 5, but
> this is only the case for V2 and not N2 r0p3.
> 
> Split the metrics into a new version for N2-r0p3 and V2 which still
> share the same metrics. Apart from some slight naming and grouping
> differences the new metrics are functionally the same as the existing
> ones. Any missing metrics were manually appended to the end of the auto
> generated file.
> 
> For the events, the new data includes descriptions that may have product
> specific details and new groupings that will be consistent with other
> products.
> 
> After generating the metrics from the telemetry repo [1], the following
> manual steps were performed:
> 
>  * Change the hard coded slots in neoverse-n2r0p3-v2 to #slots so that
>    it will work on both N2 and V2.
> 
>  * Append some metrics from the old N2/V2 data that aren't present in
>    the telemetry data. These will possibly be added to the
>    telemetry-solution repo at a later time:
> 
>     l3d_cache_mpki, l3d_cache_miss_rate, branch_pki, ipc_rate, spec_ipc,
>     retired_rate, wasted_rate, load_spec_rate, store_spec_rate,
>     advanced_simd_spec_rate, float_point_spec_rate,
>     branch_immed_spec_rate, branch_return_spec_rate,
>     branch_indirect_spec_rate
> 
> [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2.json
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---

[...]

> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
> new file mode 100644
> index 000000000000..b01cc2120175
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
> @@ -0,0 +1,331 @@
> +[
> +    {
> +        "ArchStdEvent": "backend_bound",
> +        "MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * #slots)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
> +    },

Hi Jing,

I'm not sure if you remember, but a long time ago I said that I was
going to update these N2 and V2 metrics that you upstreamed.

Now that these are coming from the Arm telemetry repo they have an
adjustment for subtracting branch misses that we think is more accurate
because of an issue with the STALL_SLOT_BACKEND counter. I've left the
sbsa.json metrics without the adjustments as it shouldn't normally be
needed.

I also had to drop the "if (#slots - 5) else" check because it won't
apply to N2-r0p3, but I split the metrics into two instead to fix that
issue.

The rest is pretty much the same apart from some slight grouping/naming/
scaling differences.

Let me know what you think.

Thanks
James

> +    {
> +        "MetricName": "backend_stalled_cycles",
> +        "MetricExpr": "((STALL_BACKEND / CPU_CYCLES) * 100)",
> +        "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
> +        "MetricGroup": "Cycle_Accounting",
> +        "ScaleUnit": "1percent of cycles"
> +    },
> +    {
> +        "ArchStdEvent": "bad_speculation",
> +        "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 - (STALL_SLOT / (CPU_CYCLES * #slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
> +    },
> +    {
> +        "MetricName": "branch_misprediction_ratio",
> +        "MetricExpr": "(BR_MIS_PRED_RETIRED / BR_RETIRED)",
> +        "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
> +        "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
> +        "ScaleUnit": "1per branch"
> +    },



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-10 14:19 ` [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available James Clark
@ 2023-07-10 16:56   ` John Garry
  2023-07-11 10:18     ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2023-07-10 16:56 UTC (permalink / raw)
  To: James Clark, linux-perf-users, irogers, renyu.zj, will,
	linux-arm-kernel, mark.rutland
  Cc: namhyung, acme

On 10/07/2023 15:19, James Clark wrote:

+


Hi James,

It would be good to cc some additional people and lists.

> Currently version and revision fields are masked out of the MIDR so

Do you mean variant and revision?

> there can only be one set of jsons per CPU. In a later commit multiple
> revisions of Neoverse N2 json files will be provided.
> 


> The highest valid version of json files should be used,

What exactly does that mean?

So it seems that you have CPUs with matching MIDR except variant and 
revision, but have different events, right?

Then to solve that are you saying that the highest version of the JSONs 
should used, upto and including the same CPU version (???) - correct?

The cover letter doesn't mention anything relevant :(

> but to make this
> work the mapfile has to be reverse sorted on the CPUID field so that the
> highest is found first.
> It's possible, but error prone, to do this
> manually so instead add an explicit sort into jevents.py. If the CPUID
> is a string then the rows are string sorted rather than numerically.
> 

Thanks,
John


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-10 16:56   ` John Garry
@ 2023-07-11 10:18     ` James Clark
  2023-07-12  9:22       ` John Garry
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-11 10:18 UTC (permalink / raw)
  To: John Garry
  Cc: namhyung, acme, linux-perf-users, irogers, renyu.zj, will,
	linux-arm-kernel, mark.rutland



On 10/07/2023 17:56, John Garry wrote:
> On 10/07/2023 15:19, James Clark wrote:
> 
> +
> 
> 
> Hi James,
> 
> It would be good to cc some additional people and lists.

Hi John,

Resent a v3 with the normal get_maintainer list. I did this with the
first one but it bounced and for some reason with the v2 I reduced the
CC list. Not sure why really...

> 
>> Currently version and revision fields are masked out of the MIDR so
> 
> Do you mean variant and revision?
> 

Yes that should have been variant. Fixed in v3, thanks.

>> there can only be one set of jsons per CPU. In a later commit multiple
>> revisions of Neoverse N2 json files will be provided.
>>
> 
> 
>> The highest valid version of json files should be used,
> 
> What exactly does that mean?
> 
> So it seems that you have CPUs with matching MIDR except variant and
> revision, but have different events, right?

Yes. In this case we changed how a metric is calculated in N2-r0p3
because the workaround that was needed for r0p0 is no longer needed
(CPU_CYCLES should not subtracted from stalls for topdown metrics
anymore). But it would also support the case where the event list was
slightly different between versions.

> 
> Then to solve that are you saying that the highest version of the JSONs
> should used, upto and including the same CPU version (???) - correct?

Yeah pretty much. It's highest version of JSONs starting from
(including) the same CPU version, up to and not including the next
version of JSON files provided. If only r0p0 is provided then all CPU
versions match.

So if these JSONs were provided:

 r0p0  -  r0p4  -  r1p1

Would match these CPUs:

 CPU   |  JSON matched
 ------|---------------
 r0p0  |  r0p0
 r0p3  |  r0p0
 r0p4  |  r0p4
 r1p0  |  r0p4
 r1p1+ |  r1p1

> 
> The cover letter doesn't mention anything relevant :(
> 

Hopefully I expanded more in the v3 cover letter.

Thanks
James

>> but to make this
>> work the mapfile has to be reverse sorted on the CPUID field so that the
>> highest is found first.
>> It's possible, but error prone, to do this
>> manually so instead add an explicit sort into jevents.py. If the CPUID
>> is a string then the rows are string sorted rather than numerically.
>>
> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-11 10:18     ` James Clark
@ 2023-07-12  9:22       ` John Garry
  2023-07-12 11:00         ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2023-07-12  9:22 UTC (permalink / raw)
  To: James Clark
  Cc: namhyung, acme, linux-perf-users, irogers, renyu.zj, will,
	linux-arm-kernel, mark.rutland

On 11/07/2023 11:18, James Clark wrote:
>>> The highest valid version of json files should be used,
>> What exactly does that mean?
>>
>> So it seems that you have CPUs with matching MIDR except variant and
>> revision, but have different events, right?
> Yes. In this case we changed how a metric is calculated in N2-r0p3
> because the workaround that was needed for r0p0 is no longer needed
> (CPU_CYCLES should not subtracted from stalls for topdown metrics
> anymore).

If there are only very subtle differences, then could we solve with 
metric expression function? We already do something else like this for 
literal #slots for arm64

> But it would also support the case where the event list was
> slightly different between versions.

I am diff'ing folders ../arm64/arm/neoverse-n2r0p3-v2 and 
../arm64/arm/neoverse-n2r0p0, and all the events are the same - am I 
correct? It only seems like metric json is different.

> 
>> Then to solve that are you saying that the highest version of the JSONs
>> should used, upto and including the same CPU version (???) - correct?
> Yeah pretty much. It's highest version of JSONs starting from
> (including) the same CPU version, up to and not including the next
> version of JSON files provided. If only r0p0 is provided then all CPU
> versions match.
> 
> So if these JSONs were provided:
> 
>   r0p0  -  r0p4  -  r1p1
> 
> Would match these CPUs:
> 
>   CPU   |  JSON matched
>   ------|---------------
>   r0p0  |  r0p0
>   r0p3  |  r0p0
>   r0p4  |  r0p4
>   r1p0  |  r0p4
>   r1p1+ |  r1p1
> 
>> The cover letter doesn't mention anything relevant 🙁
>>
> Hopefully I expanded more in the v3 cover letter.

Thanks,
John


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-12  9:22       ` John Garry
@ 2023-07-12 11:00         ` James Clark
  2023-07-12 11:33           ` John Garry
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-12 11:00 UTC (permalink / raw)
  To: John Garry
  Cc: namhyung, acme, linux-perf-users, irogers, renyu.zj, will,
	linux-arm-kernel, mark.rutland



On 12/07/2023 10:22, John Garry wrote:
> On 11/07/2023 11:18, James Clark wrote:
>>>> The highest valid version of json files should be used,
>>> What exactly does that mean?
>>>
>>> So it seems that you have CPUs with matching MIDR except variant and
>>> revision, but have different events, right?
>> Yes. In this case we changed how a metric is calculated in N2-r0p3
>> because the workaround that was needed for r0p0 is no longer needed
>> (CPU_CYCLES should not subtracted from stalls for topdown metrics
>> anymore).
> 
> If there are only very subtle differences, then could we solve with
> metric expression function? We already do something else like this for
> literal #slots for arm64

Possibly, but I'd have to add a new mechanism to expose the version
numbers to the metrics so it can be used in an expression. And I think
it could get very messy quite quickly and make the expression hard to
read. In fact in this specific case I'm not even sure how it would look.
Something like this (where #p is the p in r0p3)?

 {
 "ArchStdEvent": "bad_speculation",
 "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 -
((STALL_SLOT - (CPU_CYCLES if (#p >= 3) else 0) / (CPU_CYCLES *
#slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
 },

In this case it's not so bad because we don't need to compare r as well,
but I think it could get worse if r was non 0 or there were more
significant differences to the metric.

It also doesn't scale in the same way when we're using auto generated
metrics with the data that Arm is publishing. We're going to treat CPU
versions as a completely new set of JSONs if there are any
differences/fixes in the PMU stuff and just publish a whole new set.
That way any tool that's consuming these can just check the MIDR and use
whichever JSONS are most appropriate. In theory this shouldn't require
any manual intervention or re-writing of expressions. And it saves every
tool having to solve the same problem.

It's possible in the future that there are also small changes to the
description text or availability of the events between versions. It
hasn't happened yet, but it gives us the flexibility to do that too.

> 
>> But it would also support the case where the event list was
>> slightly different between versions.
> 
> I am diff'ing folders ../arm64/arm/neoverse-n2r0p3-v2 and
> ../arm64/arm/neoverse-n2r0p0, and all the events are the same - am I
> correct? It only seems like metric json is different.
> 

Yes that's correct, just the topdown metrics where CPU_CYCLES was
subtracted as a workaround have been removed. N2 r0p3 no longer needs
that workaround. But everything else is the same.

I also used it as an opportunity to remove #slots from N2-r0p0. That set
of metrics will only be used in one place where slots is always 5. It's
better to hard code it so that it also has a chance to work on kernels
that don't have the change to expose slots yet.

>>
>>> Then to solve that are you saying that the highest version of the JSONs
>>> should used, upto and including the same CPU version (???) - correct?
>> Yeah pretty much. It's highest version of JSONs starting from
>> (including) the same CPU version, up to and not including the next
>> version of JSON files provided. If only r0p0 is provided then all CPU
>> versions match.
>>
>> So if these JSONs were provided:
>>
>>   r0p0  -  r0p4  -  r1p1
>>
>> Would match these CPUs:
>>
>>   CPU   |  JSON matched
>>   ------|---------------
>>   r0p0  |  r0p0
>>   r0p3  |  r0p0
>>   r0p4  |  r0p4
>>   r1p0  |  r0p4
>>   r1p1+ |  r1p1
>>
>>> The cover letter doesn't mention anything relevant 🙁
>>>
>> Hopefully I expanded more in the v3 cover letter.
> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-12 11:00         ` James Clark
@ 2023-07-12 11:33           ` John Garry
  2023-07-12 14:06             ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2023-07-12 11:33 UTC (permalink / raw)
  To: James Clark, irogers
  Cc: namhyung, acme, linux-perf-users, renyu.zj, will,
	linux-arm-kernel, mark.rutland

On 12/07/2023 12:00, James Clark wrote:
> 
> 
> On 12/07/2023 10:22, John Garry wrote:
>> On 11/07/2023 11:18, James Clark wrote:
>>>>> The highest valid version of json files should be used,
>>>> What exactly does that mean?
>>>>
>>>> So it seems that you have CPUs with matching MIDR except variant and
>>>> revision, but have different events, right?
>>> Yes. In this case we changed how a metric is calculated in N2-r0p3
>>> because the workaround that was needed for r0p0 is no longer needed
>>> (CPU_CYCLES should not subtracted from stalls for topdown metrics
>>> anymore).
>>
>> If there are only very subtle differences, then could we solve with
>> metric expression function? We already do something else like this for
>> literal #slots for arm64
> 
> Possibly, but I'd have to add a new mechanism to expose the version
> numbers to the metrics so it can be used in an expression. And I think
> it could get very messy quite quickly and make the expression hard to
> read. In fact in this specific case I'm not even sure how it would look.
> Something like this (where #p is the p in r0p3)?
> 
>   {
>   "ArchStdEvent": "bad_speculation",
>   "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 -
> ((STALL_SLOT - (CPU_CYCLES if (#p >= 3) else 0) / (CPU_CYCLES *
> #slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"

Could this formula for "CPU_CYCLES if (#p >= 3) else 0" be generalised 
with a literal (like what we do for #slots)?

>   },
> 
> In this case it's not so bad because we don't need to compare r as well,
> but I think it could get worse if r was non 0 or there were more
> significant differences to the metric.
> 
> It also doesn't scale in the same way when we're using auto generated
> metrics with the data that Arm is publishing. We're going to treat CPU
> versions as a completely new set of JSONs if there are any
> differences/fixes in the PMU stuff and just publish a whole new set.
> That way any tool that's consuming these can just check the MIDR and use
> whichever JSONS are most appropriate.

So far we have got away with ignoring revision and variant - why change 
this now just for these arm parts is a good question..

> In theory this shouldn't require
> any manual intervention or re-writing of expressions. And it saves every
> tool having to solve the same problem.

Sure, but we have tried to reduce duplication as much as possible, and 
example would be ArchStdEvents

> 
> It's possible in the future that there are also small changes to the
> description text or availability of the events between versions. It
> hasn't happened yet, but it gives us the flexibility to do that too.

Here the (handful of) metrics are subtly different between these 
rev/variant, but all the events are the same. It's hard to justify 
duplicating almost everything just for that.

> 

I am not sure on how to handle this, since we might be bombarded with 
support for more CPUs with the same MIDR issue and having quirks for all 
becomes unworkable.

Can you just give the idea above on literals/functions in the metricexpr 
a go, to see how it looks? Maybe Ian has a good idea on similar solutions.

Thanks,
John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-12 11:33           ` John Garry
@ 2023-07-12 14:06             ` James Clark
  2023-07-12 14:15               ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-12 14:06 UTC (permalink / raw)
  To: John Garry, irogers
  Cc: namhyung, acme, linux-perf-users, renyu.zj, will,
	linux-arm-kernel, mark.rutland



On 12/07/2023 12:33, John Garry wrote:
> On 12/07/2023 12:00, James Clark wrote:
>>
>>
>> On 12/07/2023 10:22, John Garry wrote:
>>> On 11/07/2023 11:18, James Clark wrote:
>>>>>> The highest valid version of json files should be used,
>>>>> What exactly does that mean?
>>>>>
>>>>> So it seems that you have CPUs with matching MIDR except variant and
>>>>> revision, but have different events, right?
>>>> Yes. In this case we changed how a metric is calculated in N2-r0p3
>>>> because the workaround that was needed for r0p0 is no longer needed
>>>> (CPU_CYCLES should not subtracted from stalls for topdown metrics
>>>> anymore).
>>>
>>> If there are only very subtle differences, then could we solve with
>>> metric expression function? We already do something else like this for
>>> literal #slots for arm64
>>
>> Possibly, but I'd have to add a new mechanism to expose the version
>> numbers to the metrics so it can be used in an expression. And I think
>> it could get very messy quite quickly and make the expression hard to
>> read. In fact in this specific case I'm not even sure how it would look.
>> Something like this (where #p is the p in r0p3)?
>>
>>   {
>>   "ArchStdEvent": "bad_speculation",
>>   "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 -
>> ((STALL_SLOT - (CPU_CYCLES if (#p >= 3) else 0) / (CPU_CYCLES *
>> #slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
> 
> Could this formula for "CPU_CYCLES if (#p >= 3) else 0" be generalised
> with a literal (like what we do for #slots)?
> 

Yes that could work, then we wouldn't need the logic inside the formula.

>>   },
>>
>> In this case it's not so bad because we don't need to compare r as well,
>> but I think it could get worse if r was non 0 or there were more
>> significant differences to the metric.
>>
>> It also doesn't scale in the same way when we're using auto generated
>> metrics with the data that Arm is publishing. We're going to treat CPU
>> versions as a completely new set of JSONs if there are any
>> differences/fixes in the PMU stuff and just publish a whole new set.
>> That way any tool that's consuming these can just check the MIDR and use
>> whichever JSONS are most appropriate.
> 
> So far we have got away with ignoring revision and variant - why change
> this now just for these arm parts is a good question..
> 
>> In theory this shouldn't require
>> any manual intervention or re-writing of expressions. And it saves every
>> tool having to solve the same problem.
> 
> Sure, but we have tried to reduce duplication as much as possible, and
> example would be ArchStdEvents
> 

What's the end goal of reducing duplication? Is it to reduce the binary
size or to reduce human input?

For the binary size if most of these strings are the same then they are
de-duplicated and the impact on the binary size is very small.

If it's to save on human input, then not doing it this way has the
potential to be more work in the long run. At the moment we can just run
a script and generate the Arm JSONs using what Arm is publishing. We
might understand all these subtle de-duplication efforts, but anyone
coming in the future is going to have a hard time understanding.
Especially if they just want to re-run the script to update some text,
then there is a big risk of making a mistake when re-applying the manual
changes.

I'm not completely opposed to the idea of re-working this expression
this time so that it works in both places, but I think we should be sure
that we do it for the right reasons.

As an example for comparison, I tried a build with a single set of
events and both sets (from this change) and the difference in size is
1760 bytes. Is it really worth starting to edit formulas just to save
that much space?

>>
>> It's possible in the future that there are also small changes to the
>> description text or availability of the events between versions. It
>> hasn't happened yet, but it gives us the flexibility to do that too.
> 
> Here the (handful of) metrics are subtly different between these
> rev/variant, but all the events are the same. It's hard to justify
> duplicating almost everything just for that.
> 

If it saves time and reduces the chance of making mistakes then
personally I don't see an issue with the duplication.

>>
> 
> I am not sure on how to handle this, since we might be bombarded with
> support for more CPUs with the same MIDR issue and having quirks for all
> becomes unworkable.
> 

I really don't see this being used very often. And it will always be
used much less often than the task of adding entirely new CPUs, so it
won't cause the number of JSONs to grow faster than that already
existing work. To be honest there is a non-zero chance that it never
gets used again.

Although even if we never use it again we still might want to
re-generate the JSONS because of a typo fix or something so that task is
just a script run rather than script run, and then re-applying the
expression fixes.

> Can you just give the idea above on literals/functions in the metricexpr
> a go, to see how it looks? Maybe Ian has a good idea on similar solutions.
> 
> Thanks,
> John

Yes I can give it a go. Also interested to see what Ian thinks as well.

Thanks
James

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-12 14:06             ` James Clark
@ 2023-07-12 14:15               ` James Clark
  2023-07-12 15:33                 ` John Garry
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2023-07-12 14:15 UTC (permalink / raw)
  To: John Garry, irogers
  Cc: namhyung, acme, linux-perf-users, renyu.zj, will,
	linux-arm-kernel, mark.rutland



On 12/07/2023 15:06, James Clark wrote:
> 
> 
> On 12/07/2023 12:33, John Garry wrote:
>> On 12/07/2023 12:00, James Clark wrote:
>>>
>>>
>>> On 12/07/2023 10:22, John Garry wrote:
>>>> On 11/07/2023 11:18, James Clark wrote:
>>>>>>> The highest valid version of json files should be used,
>>>>>> What exactly does that mean?
>>>>>>
>>>>>> So it seems that you have CPUs with matching MIDR except variant and
>>>>>> revision, but have different events, right?
>>>>> Yes. In this case we changed how a metric is calculated in N2-r0p3
>>>>> because the workaround that was needed for r0p0 is no longer needed
>>>>> (CPU_CYCLES should not subtracted from stalls for topdown metrics
>>>>> anymore).
>>>>
>>>> If there are only very subtle differences, then could we solve with
>>>> metric expression function? We already do something else like this for
>>>> literal #slots for arm64
>>>
>>> Possibly, but I'd have to add a new mechanism to expose the version
>>> numbers to the metrics so it can be used in an expression. And I think
>>> it could get very messy quite quickly and make the expression hard to
>>> read. In fact in this specific case I'm not even sure how it would look.
>>> Something like this (where #p is the p in r0p3)?
>>>
>>>   {
>>>   "ArchStdEvent": "bad_speculation",
>>>   "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 -
>>> ((STALL_SLOT - (CPU_CYCLES if (#p >= 3) else 0) / (CPU_CYCLES *
>>> #slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
>>
>> Could this formula for "CPU_CYCLES if (#p >= 3) else 0" be generalised
>> with a literal (like what we do for #slots)?
>>
> 
> Yes that could work, then we wouldn't need the logic inside the formula.
> 
>>>   },
>>>
>>> In this case it's not so bad because we don't need to compare r as well,
>>> but I think it could get worse if r was non 0 or there were more
>>> significant differences to the metric.
>>>
>>> It also doesn't scale in the same way when we're using auto generated
>>> metrics with the data that Arm is publishing. We're going to treat CPU
>>> versions as a completely new set of JSONs if there are any
>>> differences/fixes in the PMU stuff and just publish a whole new set.
>>> That way any tool that's consuming these can just check the MIDR and use
>>> whichever JSONS are most appropriate.
>>
>> So far we have got away with ignoring revision and variant - why change
>> this now just for these arm parts is a good question..
>>
>>> In theory this shouldn't require
>>> any manual intervention or re-writing of expressions. And it saves every
>>> tool having to solve the same problem.
>>
>> Sure, but we have tried to reduce duplication as much as possible, and
>> example would be ArchStdEvents
>>
> 
> What's the end goal of reducing duplication? Is it to reduce the binary
> size or to reduce human input?
> 
> For the binary size if most of these strings are the same then they are
> de-duplicated and the impact on the binary size is very small.
> 
> If it's to save on human input, then not doing it this way has the
> potential to be more work in the long run. At the moment we can just run
> a script and generate the Arm JSONs using what Arm is publishing. We
> might understand all these subtle de-duplication efforts, but anyone
> coming in the future is going to have a hard time understanding.
> Especially if they just want to re-run the script to update some text,
> then there is a big risk of making a mistake when re-applying the manual
> changes.
> 
> I'm not completely opposed to the idea of re-working this expression
> this time so that it works in both places, but I think we should be sure
> that we do it for the right reasons.
> 
> As an example for comparison, I tried a build with a single set of
> events and both sets (from this change) and the difference in size is
> 1760 bytes. Is it really worth starting to edit formulas just to save
> that much space?
> 

I re-ran the test with a clean build and the difference is actually 4888
bytes. Still quite small but not as suspiciously small as 1760.

>>>
>>> It's possible in the future that there are also small changes to the
>>> description text or availability of the events between versions. It
>>> hasn't happened yet, but it gives us the flexibility to do that too.
>>
>> Here the (handful of) metrics are subtly different between these
>> rev/variant, but all the events are the same. It's hard to justify
>> duplicating almost everything just for that.
>>
> 
> If it saves time and reduces the chance of making mistakes then
> personally I don't see an issue with the duplication.
> 
>>>
>>
>> I am not sure on how to handle this, since we might be bombarded with
>> support for more CPUs with the same MIDR issue and having quirks for all
>> becomes unworkable.
>>
> 
> I really don't see this being used very often. And it will always be
> used much less often than the task of adding entirely new CPUs, so it
> won't cause the number of JSONs to grow faster than that already
> existing work. To be honest there is a non-zero chance that it never
> gets used again.
> 
> Although even if we never use it again we still might want to
> re-generate the JSONS because of a typo fix or something so that task is
> just a script run rather than script run, and then re-applying the
> expression fixes.
> 
>> Can you just give the idea above on literals/functions in the metricexpr
>> a go, to see how it looks? Maybe Ian has a good idea on similar solutions.
>>
>> Thanks,
>> John
> 
> Yes I can give it a go. Also interested to see what Ian thinks as well.
> 
> Thanks
> James

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available
  2023-07-12 14:15               ` James Clark
@ 2023-07-12 15:33                 ` John Garry
  0 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2023-07-12 15:33 UTC (permalink / raw)
  To: James Clark, irogers
  Cc: namhyung, acme, linux-perf-users, renyu.zj, will,
	linux-arm-kernel, mark.rutland

>>>
>> What's the end goal of reducing duplication? Is it to reduce the binary
>> size or to reduce human input?

Not to reduce binary size. Well that was never the original intent.

Originally we were seeing many transcribed JSONs, with descriptions 
manually copied from TRMs/ARM ARM or some random implementator 
speadsheets. There was also lots of inconsistencies of event 
descriptions and classifications between implementators and 
implemenations when there are no differences in practice – it is not 
helpful in giving a consistent user experience on arm.

The motivation to reduce duplication was for the usual reasons, like:
- minimize mistakes
- ease of maintenance
- easier to update
- etc

There was no central repository of ARM events for all implementators, 
like intel had in (I think) 01org. So it was good to try to provide a 
method in perf tool to harmonize event support there.

Obviously when implementators have their own repo of events in 
proprietary formats - like arm does - some conversion needs to be done 
(to perf tool format). And using methods in perf tool there to reduce 
duplication becomes questionable - reducing duplication would be the job 
there of the original repo author.

So we could say that arm can have as much duplication as they want in 
their JSONs as we know that they are managing in their own repo, but 
having different rules for different implementators is going to be 
difficult to manage. But allowing duplication in arm JSONs does leave 
door open for inconsistent event descriptions between implementators.

>>
>> For the binary size if most of these strings are the same then they are
>> de-duplicated and the impact on the binary size is very small.
>>
>> If it's to save on human input, then not doing it this way has the
>> potential to be more work in the long run. At the moment we can just run
>> a script and generate the Arm JSONs using what Arm is publishing. We
>> might understand all these subtle de-duplication efforts, but anyone
>> coming in the future is going to have a hard time understanding.
>> Especially if they just want to re-run the script to update some text,
>> then there is a big risk of making a mistake when re-applying the manual
>> changes.
>>
>> I'm not completely opposed to the idea of re-working this expression
>> this time so that it works in both places, but I think we should be sure
>> that we do it for the right reasons.

For sure. For this issue, it's ironically a bit annoying to have such a 
small set of differences - otherwise we could just let it in as is. But 
have a small set of subtle differences in a handful or metrics makes me 
think that we can solve this with literal / formula.

Again, if this turns out to be a continuous issue popping up, then it 
may become an unmanageable rule.

>>
>> As an example for comparison, I tried a build with a single set of
>> events and both sets (from this change) and the difference in size is
>> 1760 bytes. Is it really worth starting to edit formulas just to save
>> that much space?
>>
> I re-ran the test with a clean build and the difference is actually 4888
> bytes. Still quite small but not as suspiciously small as 1760.

But it's still not huge in terms of size.

FWIW, we could prob reduce size more by doing the archstdevent fixup 
during runtime and not in generating pmu-events.c, which would be saving 
more space.

> 
>>>> It's possible in the future that there are also small changes to the
>>>> description text or availability of the events between versions. It
>>>> hasn't happened yet, but it gives us the flexibility to do that too.
>>> Here the (handful of) metrics are subtly different between these
>>> rev/variant, but all the events are the same. It's hard to justify
>>> duplicating almost everything just for that.
>>>
>> If it saves time and reduces the chance of making mistakes then
>> personally I don't see an issue with the duplication.
>>
>>> I am not sure on how to handle this, since we might be bombarded with
>>> support for more CPUs with the same MIDR issue and having quirks for all
>>> becomes unworkable.
>>>
>> I really don't see this being used very often. And it will always be
>> used much less often than the task of adding entirely new CPUs, so it
>> won't cause the number of JSONs to grow faster than that already
>> existing work. To be honest there is a non-zero chance that it never
>> gets used again.
>>
>> Although even if we never use it again we still might want to
>> re-generate the JSONS because of a typo fix or something so that task is
>> just a script run rather than script run, and then re-applying the
>> expression fixes.
>>
>>> Can you just give the idea above on literals/functions in the metricexpr
>>> a go, to see how it looks? Maybe Ian has a good idea on similar solutions.
>>>
>>> Thanks,
>>> John
>> Yes I can give it a go. Also interested to see what Ian thinks as well.

Cheers for that.

John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo
  2023-07-10 14:28   ` James Clark
@ 2023-07-27  3:50     ` Jing Zhang
  2023-08-07 16:04       ` James Clark
  0 siblings, 1 reply; 17+ messages in thread
From: Jing Zhang @ 2023-07-27  3:50 UTC (permalink / raw)
  To: James Clark, John Garry, Ian Rogers; +Cc: namhyung, acme, linux-perf-users



在 2023/7/10 下午10:28, James Clark 写道:
> 
> 
> On 10/07/2023 15:19, James Clark wrote:
>> The new metrics contain a fix for N2 r0p3 where CPU_CYCLES should not be
>> subtracted from stalls for topdown metrics anymore. The current metrics
>> assume that the fix should be applied anywhere where slots != 5, but
>> this is only the case for V2 and not N2 r0p3.
>>
>> Split the metrics into a new version for N2-r0p3 and V2 which still
>> share the same metrics. Apart from some slight naming and grouping
>> differences the new metrics are functionally the same as the existing
>> ones. Any missing metrics were manually appended to the end of the auto
>> generated file.
>>
>> For the events, the new data includes descriptions that may have product
>> specific details and new groupings that will be consistent with other
>> products.
>>
>> After generating the metrics from the telemetry repo [1], the following
>> manual steps were performed:
>>
>>  * Change the hard coded slots in neoverse-n2r0p3-v2 to #slots so that
>>    it will work on both N2 and V2.
>>
>>  * Append some metrics from the old N2/V2 data that aren't present in
>>    the telemetry data. These will possibly be added to the
>>    telemetry-solution repo at a later time:
>>
>>     l3d_cache_mpki, l3d_cache_miss_rate, branch_pki, ipc_rate, spec_ipc,
>>     retired_rate, wasted_rate, load_spec_rate, store_spec_rate,
>>     advanced_simd_spec_rate, float_point_spec_rate,
>>     branch_immed_spec_rate, branch_return_spec_rate,
>>     branch_indirect_spec_rate
>>
>> [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2.json
>>
>> Signed-off-by: James Clark <james.clark@arm.com>
>> ---
> 
> [...]
> 
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
>> new file mode 100644
>> index 000000000000..b01cc2120175
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
>> @@ -0,0 +1,331 @@
>> +[
>> +    {
>> +        "ArchStdEvent": "backend_bound",
>> +        "MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * #slots)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
>> +    },
> 
> Hi Jing,
> 
> I'm not sure if you remember, but a long time ago I said that I was
> going to update these N2 and V2 metrics that you upstreamed.
> 
> Now that these are coming from the Arm telemetry repo they have an
> adjustment for subtracting branch misses that we think is more accurate
> because of an issue with the STALL_SLOT_BACKEND counter. I've left the
> sbsa.json metrics without the adjustments as it shouldn't normally be
> needed.
> 
> I also had to drop the "if (#slots - 5) else" check because it won't
> apply to N2-r0p3, but I split the metrics into two instead to fix that
> issue.
> 
> The rest is pretty much the same apart from some slight grouping/naming/
> scaling differences.
> 
> Let me know what you think.
> 


Hi James,

Sorry, I just saw it now, I think it is ok to update the metric. But because only
the metric is different, is it a bit redundant to copy all the events?

For uncore_sys_pmu_event, there is a way to deal with this problem, which is to
identify different rivision identifiers through a "Compat" field. So I wonder if it
is also possible to add a field similar to "Compat" when describing the metric,
so that the metric can be matched to different revisions. For example:


+    {
+        "ArchStdEvent": "frontend_bound",
+        "MetricExpr": "(100 * ((STALL_SLOT_FRONTEND / (CPU_CYCLES * #slots)) - (BR_MIS_PRED / CPU_CYCLES)))"
+	 "Compat": "0x00000000410fd493;0x00000000410fd4f0"
+    },

+    {
+        "ArchStdEvent": "frontend_bound",
+        "MetricExpr": "(100 * (((STALL_SLOT_FRONTEND - CPU_CYCLES) / (5 * CPU_CYCLES)) - (BR_MIS_PRED / CPU_CYCLES)))"
+	 "Compat": "0x00000000410fd490"
+    },

With this “Compat” field, it only matches the specified revision, and if the “Compat”
field is not added, it can match all revisions. The 'Compat' field for the Core PMU
is currently not supported. However, I believe that adding this is not complex,
and it also has good readability without causing redundancy. Would it be worth a try?

What do you think?


Thanks,
Jing

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo
  2023-07-27  3:50     ` Jing Zhang
@ 2023-08-07 16:04       ` James Clark
  0 siblings, 0 replies; 17+ messages in thread
From: James Clark @ 2023-08-07 16:04 UTC (permalink / raw)
  To: Jing Zhang, John Garry, Ian Rogers; +Cc: namhyung, acme, linux-perf-users



On 27/07/2023 04:50, Jing Zhang wrote:
> 
> 
> 在 2023/7/10 下午10:28, James Clark 写道:
>>
>>
>> On 10/07/2023 15:19, James Clark wrote:
>>> The new metrics contain a fix for N2 r0p3 where CPU_CYCLES should not be
>>> subtracted from stalls for topdown metrics anymore. The current metrics
>>> assume that the fix should be applied anywhere where slots != 5, but
>>> this is only the case for V2 and not N2 r0p3.
>>>
>>> Split the metrics into a new version for N2-r0p3 and V2 which still
>>> share the same metrics. Apart from some slight naming and grouping
>>> differences the new metrics are functionally the same as the existing
>>> ones. Any missing metrics were manually appended to the end of the auto
>>> generated file.
>>>
>>> For the events, the new data includes descriptions that may have product
>>> specific details and new groupings that will be consistent with other
>>> products.
>>>
>>> After generating the metrics from the telemetry repo [1], the following
>>> manual steps were performed:
>>>
>>>  * Change the hard coded slots in neoverse-n2r0p3-v2 to #slots so that
>>>    it will work on both N2 and V2.
>>>
>>>  * Append some metrics from the old N2/V2 data that aren't present in
>>>    the telemetry data. These will possibly be added to the
>>>    telemetry-solution repo at a later time:
>>>
>>>     l3d_cache_mpki, l3d_cache_miss_rate, branch_pki, ipc_rate, spec_ipc,
>>>     retired_rate, wasted_rate, load_spec_rate, store_spec_rate,
>>>     advanced_simd_spec_rate, float_point_spec_rate,
>>>     branch_immed_spec_rate, branch_return_spec_rate,
>>>     branch_indirect_spec_rate
>>>
>>> [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2.json
>>>
>>> Signed-off-by: James Clark <james.clark@arm.com>
>>> ---
>>
>> [...]
>>
>>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
>>> new file mode 100644
>>> index 000000000000..b01cc2120175
>>> --- /dev/null
>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2r0p3-v2/metrics.json
>>> @@ -0,0 +1,331 @@
>>> +[
>>> +    {
>>> +        "ArchStdEvent": "backend_bound",
>>> +        "MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * #slots)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
>>> +    },
>>
>> Hi Jing,
>>
>> I'm not sure if you remember, but a long time ago I said that I was
>> going to update these N2 and V2 metrics that you upstreamed.
>>
>> Now that these are coming from the Arm telemetry repo they have an
>> adjustment for subtracting branch misses that we think is more accurate
>> because of an issue with the STALL_SLOT_BACKEND counter. I've left the
>> sbsa.json metrics without the adjustments as it shouldn't normally be
>> needed.
>>
>> I also had to drop the "if (#slots - 5) else" check because it won't
>> apply to N2-r0p3, but I split the metrics into two instead to fix that
>> issue.
>>
>> The rest is pretty much the same apart from some slight grouping/naming/
>> scaling differences.
>>
>> Let me know what you think.
>>
> 
> 
> Hi James,
> 
> Sorry, I just saw it now, I think it is ok to update the metric. But because only
> the metric is different, is it a bit redundant to copy all the events?
> 
> For uncore_sys_pmu_event, there is a way to deal with this problem, which is to
> identify different rivision identifiers through a "Compat" field. So I wonder if it
> is also possible to add a field similar to "Compat" when describing the metric,
> so that the metric can be matched to different revisions. For example:
> 
> 
> +    {
> +        "ArchStdEvent": "frontend_bound",
> +        "MetricExpr": "(100 * ((STALL_SLOT_FRONTEND / (CPU_CYCLES * #slots)) - (BR_MIS_PRED / CPU_CYCLES)))"
> +	 "Compat": "0x00000000410fd493;0x00000000410fd4f0"
> +    },
> 
> +    {
> +        "ArchStdEvent": "frontend_bound",
> +        "MetricExpr": "(100 * (((STALL_SLOT_FRONTEND - CPU_CYCLES) / (5 * CPU_CYCLES)) - (BR_MIS_PRED / CPU_CYCLES)))"
> +	 "Compat": "0x00000000410fd490"
> +    },
> 
> With this “Compat” field, it only matches the specified revision, and if the “Compat”
> field is not added, it can match all revisions. The 'Compat' field for the Core PMU
> is currently not supported. However, I believe that adding this is not complex,
> and it also has good readability without causing redundancy. Would it be worth a try?
> 
> What do you think?
> 
> 
> Thanks,
> Jing

Hi Jing,

I posted a V4 on the list. I went with John's approach of modifying the
metric expression literal so that the V2 and N2 metrics could be
re-used. I thought that was simpler than the "Compat" field. And I
thought that it might be duplicating some of what the mapfile.csv
already does.

Instead of adding this compat field for core PMUs, if we want to do
something like this, maybe we could make it so that new versions of CPUs
can be added in the mapfile, but they append and modify all the previous
matching versions of JSONs found. So in this case it would work like I
add some new JSONs for N2 r0p3, but they only contain the modified
metrics, rather than everything, and that overwrites that part of the
r0p0 metrics. That way we could add new items in the existing mapfile
but they choose how much or how little of the previous revisions they
would like to replace.

Otherwise we could end up being in a situation where instead of the
mapfile we just add the compat field to all metrics and then we can
delete the mapfile and everything still works. So it seems like it could
end up being two different ways to do the same thing. I'm not completely
against this idea, but I suppose in this exact scenario the new literal
was less change and works pretty well.

Thanks
James


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-07 16:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-10 14:18 [PATCH v2 0/5] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo James Clark
2023-07-10 14:18 ` [PATCH v2 1/5] perf: cs-etm: Don't duplicate FIELD_GET() James Clark
2023-07-10 14:19 ` [PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available James Clark
2023-07-10 16:56   ` John Garry
2023-07-11 10:18     ` James Clark
2023-07-12  9:22       ` John Garry
2023-07-12 11:00         ` James Clark
2023-07-12 11:33           ` John Garry
2023-07-12 14:06             ` James Clark
2023-07-12 14:15               ` James Clark
2023-07-12 15:33                 ` John Garry
2023-07-10 14:19 ` [PATCH v2 3/5] perf vendor events arm64: Update scale units and descriptions of common topdown metrics James Clark
2023-07-10 14:19 ` [PATCH v2 4/5] perf vendor events arm64: Update N2-r0p3 and V2 metrics and events using Arm telemetry repo James Clark
2023-07-10 14:28   ` James Clark
2023-07-27  3:50     ` Jing Zhang
2023-08-07 16:04       ` James Clark
2023-07-10 14:19 ` [PATCH v2 5/5] perf vendor events arm64: Update N2-r0p0 " James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).