From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B84F33054E4;
	Wed,  4 Mar 2026 01:26:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1772587613; cv=none; b=aTHUD83GPCCGIIEo4bcMKeai1iXpYpQYNa4DTU2LyEq+IqL9ZIlmTL87kpPgvQORFBPYe6iZv3q8ySeWSf46uTrNbmMO5dhwChbqfWi7Co/aL9O2+VOfzJ5Gur3ENo7Yx2LAVVZFWfEdtEEJS+Inop6D/Kv5bS5ozg/rP3ik4F4=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1772587613; c=relaxed/simple;
	bh=umUTBDj7oH8nBYAQdFQbOEBxTv+W6TccI8sKEvYA/vk=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=gKhqH1ycMLxIM6z+9xltdqcn8yTcZ7/d62Av5/nIvZjaplG2jdnoPe9iKDDsm/Z7ABViAWFzBmHClK0tdC2b8+rhhZMoWAvbUn0vkV+0dney7KXiBdnFw3bemGCGLwb5gWw21ZSi3/xbCbkporF+yoVlVMAh8Rx0j7Yrp8A+OGU=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PHGPgPrn; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PHGPgPrn"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3513AC116C6;
	Wed,  4 Mar 2026 01:26:52 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1772587613;
	bh=umUTBDj7oH8nBYAQdFQbOEBxTv+W6TccI8sKEvYA/vk=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=PHGPgPrnDkdkOEWtplx89lTG3y5tUR/a8aKpQARzKMOdPkeN/Xin7wQMisLY3xxhU
	 /JIayYB18LN+HsP5c8GsCLSbKAJGbRPvyWIZ0TFp9oOf+ikWARi4lWLniQ4xJm7iaC
	 pXupNOhytKaUcjE1Pekk/wAmyTFdeYfEOugEi9r9uKrYAxgBPa93OYGzBkQk9JAb0K
	 kt77MChbA8axdKibN0C53Jst2XnqocACAGcMImCDzsmLBYhDvZ8EZHzTMBPvAjw3HS
	 P0gyeVk80fwWlWXVa+AiujKD+9XEYpbxQSCsIt4ZAMePyAfSKaON4+O3RoOdqzgmjL
	 bRZxzDfLkfOMQ==
Date: Tue, 3 Mar 2026 17:26:50 -0800
From: Namhyung Kim <namhyung@kernel.org>
To: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: irogers@google.com, james.clark@linaro.org, john.g.garry@oracle.com,
	will@kernel.org, mike.leach@linaro.org, leo.yan@linux.dev,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@kernel.org, adrian.hunter@intel.com, peterz@infradead.org,
	mingo@redhat.com, acme@kernel.org, linux-tegra@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	tmakin@nvidia.com, vsethi@nvidia.com, rwiley@nvidia.com,
	skelley@nvidia.com, ywan@nvidia.com, treding@nvidia.com,
	jonathanh@nvidia.com, mochs@nvidia.com
Subject: Re: [PATCH v2] perf vendor events arm64: Add Tegra410 Olympus PMU
 events
Message-ID: <aaeKWohsFFx3Trww@google.com>
References: <20260212233407.1432673-1-bwicaksono@nvidia.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <20260212233407.1432673-1-bwicaksono@nvidia.com>

Hello,

On Thu, Feb 12, 2026 at 11:34:07PM +0000, Besar Wicaksono wrote:
> Add JSON files for NVIDIA Tegra410 Olympus core PMU events.
> Also updated the common-and-microarch.json.
>=20
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>=20
> Changes from v1:
>   * Remove CHAIN event
>   * Update event description and fix spelling and capitalization mistakes
> Thanks to Ian and James for the review.
> v1: https://lore.kernel.org/all/20260127225909.3296202-1-bwicaksono@nvidi=
a.com/T/#u

Ian and James, can you please take a look again?

Thanks,
Namhyung

>=20
> ---
>  .../arch/arm64/common-and-microarch.json      |  85 +++
>  tools/perf/pmu-events/arch/arm64/mapfile.csv  |   1 +
>  .../arch/arm64/nvidia/t410/branch.json        |  45 ++
>  .../arch/arm64/nvidia/t410/brbe.json          |   6 +
>  .../arch/arm64/nvidia/t410/bus.json           |  48 ++
>  .../arch/arm64/nvidia/t410/exception.json     |  62 ++
>  .../arch/arm64/nvidia/t410/fp_operation.json  |  78 ++
>  .../arch/arm64/nvidia/t410/general.json       |  15 +
>  .../arch/arm64/nvidia/t410/l1d_cache.json     | 122 +++
>  .../arch/arm64/nvidia/t410/l1i_cache.json     | 114 +++
>  .../arch/arm64/nvidia/t410/l2d_cache.json     | 134 ++++
>  .../arch/arm64/nvidia/t410/ll_cache.json      | 107 +++
>  .../arch/arm64/nvidia/t410/memory.json        |  46 ++
>  .../arch/arm64/nvidia/t410/metrics.json       | 722 ++++++++++++++++++
>  .../arch/arm64/nvidia/t410/misc.json          | 642 ++++++++++++++++
>  .../arch/arm64/nvidia/t410/retired.json       |  94 +++
>  .../arch/arm64/nvidia/t410/spe.json           |  42 +
>  .../arm64/nvidia/t410/spec_operation.json     | 230 ++++++
>  .../arch/arm64/nvidia/t410/stall.json         | 145 ++++
>  .../arch/arm64/nvidia/t410/tlb.json           | 158 ++++
>  20 files changed, 2896 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.j=
son
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/exceptio=
n.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_opera=
tion.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/general.=
json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cach=
e.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cach=
e.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cach=
e.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache=
=2Ejson
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.j=
son
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.=
json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.=
json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_ope=
ration.json
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.js=
on
>  create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json
>=20
> diff --git a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json b=
/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> index 468cb085d879..144325d87be4 100644
> --- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> @@ -1512,11 +1512,26 @@
>          "EventName": "L2D_CACHE_REFILL_PRFM",
>          "BriefDescription": "Level 2 data cache refill, software preload"
>      },
> +    {
> +        "EventCode": "0x8150",
> +        "EventName": "L3D_CACHE_RW",
> +        "BriefDescription": "Level 3 data cache demand access."
> +    },
> +    {
> +        "EventCode": "0x8151",
> +        "EventName": "L3D_CACHE_PRFM",
> +        "BriefDescription": "Level 3 data cache software prefetch"
> +    },
>      {
>          "EventCode": "0x8152",
>          "EventName": "L3D_CACHE_MISS",
>          "BriefDescription": "Level 3 data cache demand access miss"
>      },
> +    {
> +        "EventCode": "0x8153",
> +        "EventName": "L3D_CACHE_REFILL_PRFM",
> +        "BriefDescription": "Level 3 data cache refill, software prefetc=
h."
> +    },
>      {
>          "EventCode": "0x8154",
>          "EventName": "L1D_CACHE_HWPRF",
> @@ -1527,6 +1542,11 @@
>          "EventName": "L2D_CACHE_HWPRF",
>          "BriefDescription": "Level 2 data cache hardware prefetch."
>      },
> +    {
> +        "EventCode": "0x8156",
> +        "EventName": "L3D_CACHE_HWPRF",
> +        "BriefDescription": "Level 3 data cache hardware prefetch."
> +    },
>      {
>          "EventCode": "0x8158",
>          "EventName": "STALL_FRONTEND_MEMBOUND",
> @@ -1682,6 +1702,11 @@
>          "EventName": "L2D_CACHE_REFILL_HWPRF",
>          "BriefDescription": "Level 2 data cache refill, hardware prefetc=
h."
>      },
> +    {
> +        "EventCode": "0x81BE",
> +        "EventName": "L3D_CACHE_REFILL_HWPRF",
> +        "BriefDescription": "Level 3 data cache refill, hardware prefetc=
h."
> +    },
>      {
>          "EventCode": "0x81C0",
>          "EventName": "L1I_CACHE_HIT_RD",
> @@ -1712,11 +1737,31 @@
>          "EventName": "L1I_CACHE_HIT_RD_FPRFM",
>          "BriefDescription": "Level 1 instruction cache demand fetch firs=
t hit, fetched by software preload"
>      },
> +    {
> +        "EventCode": "0x81DC",
> +        "EventName": "L1D_CACHE_HIT_RW_FPRFM",
> +        "BriefDescription": "Level 1 data cache demand access first hit,=
 fetched by software prefetch."
> +    },
>      {
>          "EventCode": "0x81E0",
>          "EventName": "L1I_CACHE_HIT_RD_FHWPRF",
>          "BriefDescription": "Level 1 instruction cache demand fetch firs=
t hit, fetched by hardware prefetcher"
>      },
> +    {
> +        "EventCode": "0x81EC",
> +        "EventName": "L1D_CACHE_HIT_RW_FHWPRF",
> +        "BriefDescription": "Level 1 data cache demand access first hit,=
 fetched by hardware prefetcher."
> +    },
> +    {
> +        "EventCode": "0x81F0",
> +        "EventName": "L1I_CACHE_HIT_RD_FPRF",
> +        "BriefDescription": "Level 1 instruction cache demand fetch firs=
t hit, fetched by prefetch."
> +    },
> +    {
> +        "EventCode": "0x81FC",
> +        "EventName": "L1D_CACHE_HIT_RW_FPRF",
> +        "BriefDescription": "Level 1 data cache demand access first hit,=
 fetched by prefetch."
> +    },
>      {
>          "EventCode": "0x8200",
>          "EventName": "L1I_CACHE_HIT",
> @@ -1767,11 +1812,26 @@
>          "EventName": "L1I_LFB_HIT_RD_FPRFM",
>          "BriefDescription": "Level 1 instruction cache demand fetch line=
-fill buffer first hit, recently fetched by software preload"
>      },
> +    {
> +        "EventCode": "0x825C",
> +        "EventName": "L1D_LFB_HIT_RW_FPRFM",
> +        "BriefDescription": "Level 1 data cache demand access line-fill =
buffer first hit, recently fetched by software prefetch."
> +    },
>      {
>          "EventCode": "0x8260",
>          "EventName": "L1I_LFB_HIT_RD_FHWPRF",
>          "BriefDescription": "Level 1 instruction cache demand fetch line=
-fill buffer first hit, recently fetched by hardware prefetcher"
>      },
> +    {
> +        "EventCode": "0x826C",
> +        "EventName": "L1D_LFB_HIT_RW_FHWPRF",
> +        "BriefDescription": "Level 1 data cache demand access line-fill =
buffer first hit, recently fetched by hardware prefetcher."
> +    },
> +    {
> +        "EventCode": "0x827C",
> +        "EventName": "L1D_LFB_HIT_RW_FPRF",
> +        "BriefDescription": "Level 1 data cache demand access line-fill =
buffer first hit, recently fetched by prefetch."
> +    },
>      {
>          "EventCode": "0x8280",
>          "EventName": "L1I_CACHE_PRF",
> @@ -1807,6 +1867,11 @@
>          "EventName": "LL_CACHE_REFILL",
>          "BriefDescription": "Last level cache refill"
>      },
> +    {
> +        "EventCode": "0x828E",
> +        "EventName": "L3D_CACHE_REFILL_PRF",
> +        "BriefDescription": "Level 3 data cache refill, prefetch."
> +    },
>      {
>          "EventCode": "0x8320",
>          "EventName": "L1D_CACHE_REFILL_PERCYC",
> @@ -1872,6 +1937,16 @@
>          "EventName": "FP_FP8_MIN_SPEC",
>          "BriefDescription": "Floating-point operation speculatively_exec=
uted, smallest type is 8-bit floating-point."
>      },
> +    {
> +        "EventCode": "0x8480",
> +        "EventName": "FP_SP_FIXED_MIN_OPS_SPEC",
> +        "BriefDescription": "Non-scalable element arithmetic operations =
speculatively executed, smallest type is single-precision floating-point."
> +    },
> +    {
> +        "EventCode": "0x8482",
> +        "EventName": "FP_HP_FIXED_MIN_OPS_SPEC",
> +        "BriefDescription": "Non-scalable element arithmetic operations =
speculatively executed, smallest type is half-precision floating-point."
> +    },
>      {
>          "EventCode": "0x8483",
>          "EventName": "FP_BF16_FIXED_MIN_OPS_SPEC",
> @@ -1882,6 +1957,16 @@
>          "EventName": "FP_FP8_FIXED_MIN_OPS_SPEC",
>          "BriefDescription": "Non-scalable element arithmetic operations =
speculatively executed, smallest type is 8-bit floating-point."
>      },
> +    {
> +        "EventCode": "0x8488",
> +        "EventName": "FP_SP_SCALE_MIN_OPS_SPEC",
> +        "BriefDescription": "Scalable element arithmetic operations spec=
ulatively executed, smallest type is single-precision floating-point."
> +    },
> +    {
> +        "EventCode": "0x848A",
> +        "EventName": "FP_HP_SCALE_MIN_OPS_SPEC",
> +        "BriefDescription": "Scalable element arithmetic operations spec=
ulatively executed, smallest type is half-precision floating-point."
> +    },
>      {
>          "EventCode": "0x848B",
>          "EventName": "FP_BF16_SCALE_MIN_OPS_SPEC",
> diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pm=
u-events/arch/arm64/mapfile.csv
> index bb3fa8a33496..7f0eaa702048 100644
> --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
> +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> @@ -46,3 +46,4 @@
>  0x00000000500f0000,v1,ampere/emag,core
>  0x00000000c00fac30,v1,ampere/ampereone,core
>  0x00000000c00fac40,v1,ampere/ampereonex,core
> +0x000000004e0f0100,v1,nvidia/t410,core
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json b/t=
ools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
> new file mode 100644
> index 000000000000..ef4effc00ec3
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
> @@ -0,0 +1,45 @@
> +[
> +    {
> +        "ArchStdEvent": "BR_MIS_PRED",
> +        "PublicDescription": "This event counts branches which are specu=
latively executed and mispredicted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_PRED",
> +        "PublicDescription": "This event counts all speculatively execut=
ed branches."
> +    },
> +    {
> +        "EventCode": "0x017e",
> +        "EventName": "BR_PRED_BTB_CTX_UPDATE",
> +        "PublicDescription": "Branch context table update."
> +    },
> +    {
> +        "EventCode": "0x0188",
> +        "EventName": "BR_MIS_PRED_DIR_RESOLVED",
> +        "PublicDescription": "Number of branch misprediction due to dire=
ction misprediction."
> +    },
> +    {
> +        "EventCode": "0x0189",
> +        "EventName": "BR_MIS_PRED_DIR_UNCOND_RESOLVED",
> +        "PublicDescription": "Number of branch misprediction due to dire=
ction misprediction for unconditional branches."
> +    },
> +    {
> +        "EventCode": "0x018a",
> +        "EventName": "BR_MIS_PRED_DIR_UNCOND_DIRECT_RESOLVED",
> +        "PublicDescription": "Number of branch misprediction due to dire=
ction misprediction for unconditional direct branches."
> +    },
> +    {
> +        "EventCode": "0x018b",
> +        "EventName": "BR_PRED_MULTI_RESOLVED",
> +        "PublicDescription": "Number of resolved branch which made predi=
ction by polymorphic indirect predictor."
> +    },
> +    {
> +        "EventCode": "0x018c",
> +        "EventName": "BR_MIS_PRED_MULTI_RESOLVED",
> +        "PublicDescription": "Number of branch misprediction which made =
prediction by polymorphic indirect predictor."
> +    },
> +    {
> +        "EventCode": "0x01e4",
> +        "EventName": "BR_RGN_RECLAIM",
> +        "PublicDescription": "This event counts the Indirect predictor e=
ntries flushed by region reclamation."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json b/too=
ls/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json
> new file mode 100644
> index 000000000000..9c315b2d7046
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json
> @@ -0,0 +1,6 @@
> +[
> +    {
> +        "ArchStdEvent": "BRB_FILTRATE",
> +        "PublicDescription": "This event counts each valid branch record=
 captured in the branch record buffer. Branch records that are not captured=
 because they are removed by filtering are not counted."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json b/tool=
s/perf/pmu-events/arch/arm64/nvidia/t410/bus.json
> new file mode 100644
> index 000000000000..5bb8de617c68
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json
> @@ -0,0 +1,48 @@
> +[
> +    {
> +        "ArchStdEvent": "BUS_ACCESS",
> +        "PublicDescription": "This event counts the number of data-beat =
accesses between the CPU and the external bus. This count includes accesses=
 due to read, write, and snoop. Each beat of data is counted individually."
> +    },
> +    {
> +        "ArchStdEvent": "BUS_CYCLES",
> +        "PublicDescription": "This event counts bus cycles in the CPU. B=
us cycles represent a clock cycle in which a transaction could be sent or r=
eceived on the interface from the CPU to the external bus. Since that inter=
face is driven at the same clock speed as the CPU, this event increments at=
 the rate of CPU clock. Regardless of the WFE/WFI state of the PE, this eve=
nt increments on each processor clock."
> +    },
> +    {
> +        "ArchStdEvent": "BUS_ACCESS_RD",
> +        "PublicDescription": "This event counts memory Read transactions=
 seen on the external bus. Each beat of data is counted individually."
> +    },
> +    {
> +        "ArchStdEvent": "BUS_ACCESS_WR",
> +        "PublicDescription": "This event counts memory Write transaction=
s seen on the external bus. Each beat of data is counted individually."
> +    },
> +    {
> +        "EventCode": "0x0154",
> +        "EventName": "BUS_REQUEST_REQ",
> +        "PublicDescription": "Bus request, request."
> +    },
> +    {
> +        "EventCode": "0x0155",
> +        "EventName": "BUS_REQUEST_RETRY",
> +        "PublicDescription": "Bus request, retry."
> +    },
> +    {
> +        "EventCode": "0x0198",
> +        "EventName": "L2_CHI_CBUSY0",
> +        "PublicDescription": "Number of RXDAT or RXRSP response received=
 width CBusy of 0."
> +    },
> +    {
> +        "EventCode": "0x0199",
> +        "EventName": "L2_CHI_CBUSY1",
> +        "PublicDescription": "Number of RXDAT or RXRSP response received=
 width CBusy of 1."
> +    },
> +    {
> +        "EventCode": "0x019a",
> +        "EventName": "L2_CHI_CBUSY2",
> +        "PublicDescription": "Number of RXDAT or RXRSP response received=
 width CBusy of 2."
> +    },
> +    {
> +        "EventCode": "0x019b",
> +        "EventName": "L2_CHI_CBUSY3",
> +        "PublicDescription": "Number of RXDAT or RXRSP response received=
 width CBusy of 3."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json =
b/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json
> new file mode 100644
> index 000000000000..ecd996c3610b
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json
> @@ -0,0 +1,62 @@
> +[
> +    {
> +        "ArchStdEvent": "EXC_TAKEN",
> +        "PublicDescription": "This event counts any taken architecturall=
y visible exceptions such as IRQ, FIQ, SError, and other synchronous except=
ions. Exceptions are counted whether or not they are taken locally."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_RETURN",
> +        "PublicDescription": "This event counts any architecturally exec=
uted exception return instructions. For example: AArch64: ERET."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_UNDEF",
> +        "PublicDescription": "This event counts the number of synchronou=
s exceptions which are taken locally that are due to attempting to execute =
an instruction that is UNDEFINED.\nAttempting to execute instruction bit pa=
tterns that have not been allocated.\nAttempting to execute instructions wh=
en they are disabled.\nAttempting to execute instructions at an inappropria=
te Exception level.\nAttempting to execute an instruction when the value of=
 PSTATE.IL is 1."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_SVC",
> +        "PublicDescription": "This event counts SVC exceptions taken loc=
ally."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_PABORT",
> +        "PublicDescription": "This event counts synchronous exceptions t=
hat are taken locally and caused by Instruction Aborts."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_DABORT",
> +        "PublicDescription": "This event counts exceptions that are take=
n locally and are caused by data aborts or SErrors. Conditions that could c=
ause those exceptions are attempting to read or write memory where the MMU =
generates a fault, attempting to read or write memory with a misaligned add=
ress, Interrupts from the nSEI inputs and internally generated SErrors."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_IRQ",
> +        "PublicDescription": "This event counts IRQ exceptions including=
 the virtual IRQs that are taken locally."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_FIQ",
> +        "PublicDescription": "This event counts FIQ exceptions including=
 the virtual FIQs that are taken locally."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_SMC",
> +        "PublicDescription": "This event counts SMC exceptions taken to =
EL3."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_HVC",
> +        "PublicDescription": "This event counts HVC exceptions taken to =
EL2."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_TRAP_PABORT",
> +        "PublicDescription": "This event counts exceptions which are tra=
ps not taken locally and are caused by Instruction Aborts. For example, att=
empting to execute an instruction with a misaligned PC."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_TRAP_DABORT",
> +        "PublicDescription": "This event counts exceptions which are tra=
ps not taken locally and are caused by Data Aborts or SError Interrupts. Co=
nditions that could cause those exceptions are:\n* Attempting to read or wr=
ite memory where the MMU generates a fault,\n* Attempting to read or write =
memory with a misaligned address,\n* Interrupts from the SEI input,\n* Inte=
rnally generated SErrors."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_TRAP_OTHER",
> +        "PublicDescription": "This event counts the number of synchronou=
s trap exceptions which are not taken locally and are not SVC, SMC, HVC, Da=
ta Aborts, Instruction Aborts, or Interrupts."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_TRAP_IRQ",
> +        "PublicDescription": "This event counts IRQ exceptions including=
 the virtual IRQs that are not taken locally."
> +    },
> +    {
> +        "ArchStdEvent": "EXC_TRAP_FIQ",
> +        "PublicDescription": "This event counts FIQs which are not taken=
 locally but taken from EL0, EL1, or EL2 to EL3 (which would be the normal =
behavior for FIQs when not executing in EL3)."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.js=
on b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json
> new file mode 100644
> index 000000000000..3588e130781d
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json
> @@ -0,0 +1,78 @@
> +[
> +    {
> +        "ArchStdEvent": "FP_HP_SPEC",
> +        "PublicDescription": "This event counts speculatively executed h=
alf precision floating point operations."
> +    },
> +    {
> +        "ArchStdEvent": "FP_SP_SPEC",
> +        "PublicDescription": "This event counts speculatively executed s=
ingle precision floating point operations."
> +    },
> +    {
> +        "ArchStdEvent": "FP_DP_SPEC",
> +        "PublicDescription": "This event counts speculatively executed d=
ouble precision floating point operations."
> +    },
> +    {
> +        "ArchStdEvent": "FP_SCALE_OPS_SPEC",
> +        "PublicDescription": "This event counts speculatively executed s=
calable single precision floating point operations."
> +    },
> +    {
> +        "ArchStdEvent": "FP_FIXED_OPS_SPEC",
> +        "PublicDescription": "This event counts speculatively executed n=
on-scalable single precision floating point operations."
> +    },
> +    {
> +        "ArchStdEvent": "FP_HP_SCALE_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the largest type was half-precision floating-point, where v is a v=
alue such that (v*(VL/128)) is the number of arithmetic operations carried =
out by the operation or instruction which causes the counter to increment.\=
nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC=
 or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_HP_FIXED_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the largest type was half-precision floating-point, where v is=
 the number of arithmetic operations carried out by the operation or which =
instruction causes the event to increment.\nThis event does not count opera=
tions that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_SP_SCALE_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the largest type was single-precision floating-point, where v is a=
 value such that (v*(VL/128)) is the number of arithmetic operations carrie=
d out by the operation or instruction which causes the event to increment.\=
nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC=
 or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_SP_FIXED_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the largest type was single-precision floating-point, where v =
is the number of arithmetic operations carried out by the operation or inst=
ruction which causes the event to increment.\nThis event does not count ope=
rations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_DP_SCALE_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the largest type was double-precision floating-point, where v is a=
 value such that (v*(VL/128)) is the number of arithmetic operations carrie=
d out by the operation or instruction which causes the event to increment.\=
nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC=
 or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_DP_FIXED_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the largest type was double-precision floating-point, where v =
is the number of arithmetic operations carried out by the operation or inst=
ruction which causes the event to increment.\nThis event does not count ope=
rations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_SP_FIXED_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the smallest type was single-precision floating-point, where v=
 is the number of arithmetic operations carried out by the operation or ins=
truction which causes the event to increment.\nThis event does not count op=
erations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_HP_FIXED_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the smallest type was half-precision floating-point, where v i=
s the number of arithmetic operations carried out by the operation or instr=
uction which causes the event to increment.\nThis event does not count oper=
ations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_BF16_FIXED_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the smallest type was BFloat16 floating-point. Where v is the =
number of arithmetic operations carried out by the operation or instruction=
 which causes the event to increment. This event does not count operations =
that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_FP8_FIXED_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed non-scalable element arithmetic operation, due to an instr=
uction where the smallest type was 8-bit floating-point, where v is the num=
ber of arithmetic operations carried out by the operation or instruction wh=
ich causes the event to increment.\nThis event does not count operations th=
at are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_SP_SCALE_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the smallest type was single-precision floating-point, where v is =
a value such that (v*(VL/128)) is the number of arithmetic operations carri=
ed out by the operation or instruction which causes the event to increment.=
\nThis event does not count operations that are counted by FP_FIXED_OPS_SPE=
C or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_HP_SCALE_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the smallest type was half-precision floating-point, where v is a =
value such that (v*(VL/128)) is the number of arithmetic operations carried=
 out by the operation or instruction which causes the event to increment.\n=
This event does not count operations that are counted by FP_FIXED_OPS_SPEC =
or FP_SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_BF16_SCALE_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the smallest type was BFloat16 floating-point, where v is a value =
such that (v*(VL/128)) is the number of arithmetic operations carried out b=
y the operation or instruction which causes the event to increment.\nThis e=
vent does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_=
SCALE2_OPS_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "FP_FP8_SCALE_MIN_OPS_SPEC",
> +        "PublicDescription": "This event increments by v for each specul=
atively executed scalable element arithmetic operation, due to an instructi=
on where the smallest type was 8-bit floating-point, where v is a value suc=
h that (v*(VL/128)) is the number of arithmetic operations carried out by t=
he operation or instruction which causes the event to increment.\nThis even=
t does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCA=
LE2_OPS_SPEC."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json b/=
tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json
> new file mode 100644
> index 000000000000..bd9c248387aa
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json
> @@ -0,0 +1,15 @@
> +[
> +    {
> +        "ArchStdEvent": "CPU_CYCLES",
> +        "PublicDescription": "This event counts CPU clock cycles when th=
e PE is not in WFE/WFI. The clock measured by this event is defined as the =
physical clock driving the CPU logic."
> +    },
> +    {
> +        "ArchStdEvent": "CNT_CYCLES",
> +        "PublicDescription": "This event increments at a constant freque=
ncy equal to the rate of increment of the System Counter, CNTPCT_EL0.\nThis=
 event does not increment when the PE is in WFE/WFI."
> +    },
> +    {
> +        "EventCode": "0x01e1",
> +        "EventName": "CPU_SLOT",
> +        "PublicDescription": "Entitled CPU slots.\nThis event counts the=
 number of slots. When in ST mode, this event shall increment by PMMIR_EL1.=
SLOTS quantities, and when in SMT partitioned resource mode (regardless of =
in WFI state or otherwise), this event is incremented by PMMIR_EL1.SLOTS/2 =
quantities."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json =
b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json
> new file mode 100644
> index 000000000000..ed6f764eff24
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json
> @@ -0,0 +1,122 @@
> +[
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL",
> +        "PublicDescription": "This event counts L1 D-cache refills cause=
d by speculatively executed load or store operations, preload instructions,=
 or hardware cache prefetching that missed in the L1 D-cache. This event on=
ly counts one event per cache line.\nSince the caches are Write-back only f=
or this processor, there are no Write-through cache accesses."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE",
> +        "PublicDescription": "This event counts L1 D-cache accesses from=
 any load/store operations, software preload, or hardware prefetch operatio=
ns. Atomic operations that resolve in the CPU's caches (near atomic operati=
ons) count as both a write access and read access. Each access to a cache l=
ine is counted including the multiple accesses caused by single instruction=
s such as LDM or STM. Each access to other L1 data or unified memory struct=
ures, for example refill buffers, write buffers, and write-back buffers, ar=
e also counted.\nThis event counts the sum of the following events:\nL1D_CA=
CHE_RD,\nL1D_CACHE_WR,\nL1D_CACHE_PRFM, and\nL1D_CACHE_HWPRF."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_WB",
> +        "PublicDescription": "This event counts write-backs of dirty dat=
a from the L1 D-cache to the L2 cache. This occurs when either a dirty cach=
e line is evicted from L1 D-cache and allocated in the L2 cache or dirty da=
ta is written to the L2 and possibly to the next level of cache. This event=
 counts both victim cache line evictions and cache write-backs from snoops =
or cache maintenance operations. The following cache operations are not cou=
nted:\n* Invalidations which do not result in data being transferred out of=
 the L1 (such as evictions of clean data),\n* Full line writes which write =
to L2 without writing L1, such as write streaming mode.\nThis event is the =
sum of the following events:\nL1D_CACHE_WB_CLEAN and\nL1D_CACHE_WB_VICTIM."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_LMISS_RD",
> +        "PublicDescription": "This event counts cache line refills into =
the L1 D-cache from any memory Read operations, that incurred additional la=
tency.\nCounts same as L1D_CACHE_REFILL_RD on this CPU."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_RD",
> +        "PublicDescription": "This event counts L1 D-cache accesses from=
 any Load operation. Atomic Load operations that resolve in the CPU's cache=
s count as both a write access and read access."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_WR",
> +        "PublicDescription": "This event counts L1 D-cache accesses gene=
rated by Store operations. This event also counts accesses caused by a DC Z=
VA (D-cache zero, specified by virtual address) instruction. Near atomic op=
erations that resolve in the CPU's caches count as a write access and read =
access.\nThis event is a subset of the L1D_CACHE event, except this event o=
nly counts memory Write operations."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL_RD",
> +        "PublicDescription": "This event counts L1 D-cache refills cause=
d by speculatively executed Load instructions where the memory Read operati=
on misses in the L1 D-cache. This event only counts one event per cache lin=
e.\nThis event is a subset of the L1D_CACHE_REFILL event, but only counts m=
emory Read operations. This event does not count reads caused by cache main=
tenance operations or preload instructions."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL_WR",
> +        "PublicDescription": "This event counts L1 D-cache refills cause=
d by speculatively executed Store instructions where the memory Write opera=
tion misses in the L1 D-cache. This event only counts one event per cache l=
ine.\nThis event is a subset of the L1D_CACHE_REFILL event, but only counts=
 memory Write operations."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
> +        "PublicDescription": "This event counts L1 D-cache refills (L1D_=
CACHE_REFILL) where the cache line data came from caches inside the immedia=
te Cluster of the Core (L2 cache)."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
> +        "PublicDescription": "This event counts L1 D-cache refills (L1D_=
CACHE_REFILL) for which the cache line data came from outside the immediate=
 Cluster of the Core, like an SLC in the system interconnect or DRAM or rem=
ote socket."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_WB_VICTIM",
> +        "PublicDescription": "This event counts dirty cache line evictio=
ns from the L1 D-cache caused by a new cache line allocation. This event do=
es not count evictions caused by cache maintenance operations.\nThis event =
is a subset of the L1D_CACHE_WB event, but only counts write-backs that are=
 a result of the line being allocated for an access made by the CPU."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_WB_CLEAN",
> +        "PublicDescription": "This event counts write-backs from the L1 =
D-cache that are a result of a coherency operation made by another CPU. Eve=
nt counts include cache maintenance operations.\nThis event is a subset of =
the L1D_CACHE_WB event."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_INVAL",
> +        "PublicDescription": "This event counts each explicit invalidati=
on of a cache line in the L1 D-cache caused by:\n* Cache Maintenance Operat=
ions (CMO) that operate by a virtual address.\n* Broadcast cache coherency =
operations from another CPU in the system.\nThis event does not count for t=
he following conditions:\n* A cache refill invalidates a cache line.\n* A C=
MO which is executed on that CPU and invalidates a cache line specified by =
Set/Way.\nNote that CMOs that operate by Set/Way cannot be broadcast from o=
ne CPU to another."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_RW",
> +        "PublicDescription": "This event counts L1 data demand cache acc=
esses from any Load or Store operation. Near atomic operations that resolve=
 in the CPU's caches count as both a write access and read access.\nThis ev=
ent is implemented as L1D_CACHE_RD + L1D_CACHE_WR"
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_PRFM",
> +        "PublicDescription": "This event counts L1 D-cache accesses from=
 software preload or prefetch instructions."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_MISS",
> +        "PublicDescription": "This event counts each demand access count=
ed by L1D_CACHE_RW that misses in the L1 Data or unified cache, causing an =
access to outside of the L1 caches of this PE."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL_PRFM",
> +        "PublicDescription": "This event counts L1 D-cache refills where=
 the cache line access was generated by software preload or prefetch instru=
ctions."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_HWPRF",
> +        "PublicDescription": "This event counts L1 D-cache accesses from=
 any Load/Store operations generated by the hardware prefetcher."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF",
> +        "PublicDescription": "This event counts each hardware prefetch a=
ccess counted by L1D_CACHE_HWPRF that causes a refill of the L1 D-cache fro=
m outside of the L1 D-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRFM",
> +        "PublicDescription": "This event counts each demand access first=
 hit counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched in r=
esponse to a prefetch instruction. That is, the L1D_CACHE_REFILL_PRFM event=
 was generated when the cache line was fetched into the cache.\nOnly the fi=
rst hit by a demand access is counted. After this event is generated for a =
cache line, the event is not generated again for the same cache line while =
it remains in the cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_HIT_RW_FHWPRF",
> +        "PublicDescription": "This event counts each demand access first=
 hit counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched by a=
 hardware prefetcher. That is, the L1D_CACHE_REFILL_HWPRF Event was generat=
ed when the cache line was fetched into the cache.\nOnly the first hit by a=
 demand access is counted. After this event is generated for a cache line, =
the event is not generated again for the same cache line while it remains i=
n the cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRF",
> +        "PublicDescription": "This event counts each demand access first=
 hit counted by L1D_CACHE_HIT_RW where the cache line was fetched in respon=
se to a prefetch instruction or by a hardware prefetcher. That is, the L1D_=
CACHE_REFILL_PRF event was generated when the cache line was fetched into t=
he cache.\nOnly the first hit by a demand access is counted. After this eve=
nt is generated for a cache line, the event is not generated again for the =
same cache line while it remains in the cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_LFB_HIT_RW_FPRFM",
> +        "PublicDescription": "This event counts each demand access line-=
fill buffer first hit counted by L1D_LFB_HIT_RW_FPRF where the cache line w=
as fetched in response to a prefetch instruction. That is, the access hits =
a cache line that is in the process of being loaded into the L1 D-cache, an=
d so does not generate a new refill, but has to wait for the previous refil=
l to complete, and the L1D_CACHE_REFILL_PRFM event was generated when the c=
ache line was fetched into the cache.\nOnly the first hit by a demand acces=
s is counted. After this event is generated for a cache line, the event is =
not generated again for the same cache line while it remains in the cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_LFB_HIT_RW_FHWPRF",
> +        "PublicDescription": "This event counts each demand access line-=
fill buffer first hit counted by L1D_LFB_HIT_RW_FPRF, where the cache line =
was fetched by a hardware prefetcher. That is, the access hits a cache line=
 that is in the process of being loaded into the L1 D-cache, and so does no=
t generate a new refill, but has to wait for the previous refill to complet=
e, and the L1D_CACHE_REFILL_HWPRF Event was generated when the cache line w=
as fetched into the cache.\nOnly the first hit by a demand access is counte=
d. After this event is generated for a cache line, the event is not generat=
ed again for the same cache line while it remains in the cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_LFB_HIT_RW_FPRF",
> +        "PublicDescription": "This event counts each demand access line-=
fill buffer first hit counted by L1D_LFB_HIT_RW where the cache line was fe=
tched in response to a prefetch instruction or by a hardware prefetcher. Th=
at is, the access hits a cache line that is in the process of being loaded =
into the L1 D-cache, and so does not generate a new refill, but has to wait=
 for the previous refill to complete, and the L1D_CACHE_REFILL_PRF event wa=
s generated when the cache line was fetched into the cache.\nOnly the first=
 hit by a demand access is counted. After this event is generated for a cac=
he line, the event is not generated again for the same cache line while it =
remains in the cache."
> +    },
> +    {
> +        "EventCode": "0x01f5",
> +        "EventName": "L1D_CACHE_REFILL_RW",
> +        "PublicDescription": "L1 D-cache refill, demand Read and Write. =
This event counts demand Read and Write accesses that causes a refill of th=
e L1 D-cache of this PE, from outside of this cache."
> +    },
> +    {
> +        "EventCode": "0x0204",
> +        "EventName": "L1D_CACHE_REFILL_OUTER_LLC",
> +        "PublicDescription": "This event counts L1D_CACHE_REFILL from L3=
 D-cache."
> +    },
> +    {
> +        "EventCode": "0x0205",
> +        "EventName": "L1D_CACHE_REFILL_OUTER_DRAM",
> +        "PublicDescription": "This event counts L1D_CACHE_REFILL from lo=
cal memory."
> +    },
> +    {
> +        "EventCode": "0x0206",
> +        "EventName": "L1D_CACHE_REFILL_OUTER_REMOTE",
> +        "PublicDescription": "This event counts L1D_CACHE_REFILL from a =
remote memory."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json =
b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json
> new file mode 100644
> index 000000000000..952454004d98
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json
> @@ -0,0 +1,114 @@
> +[
> +    {
> +        "ArchStdEvent": "L1I_CACHE_REFILL",
> +        "PublicDescription": "This event counts cache line refills in th=
e L1 I-cache caused by a missed instruction fetch (demand, hardware prefetc=
h, and software preload accesses). Instruction fetches may include accessin=
g multiple instructions, but the single cache line allocation is counted on=
ce."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE",
> +        "PublicDescription": "This event counts instruction fetches (dem=
and, hardware prefetch, and software preload accesses) which access the L1 =
Instruction Cache. Instruction Cache accesses caused by cache maintenance o=
perations are not counted."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_LMISS",
> +        "PublicDescription": "This event counts cache line refills into =
the L1 I-cache, that incurred additional latency.\nCounts the same as L1I_C=
ACHE_REFILL in this CPU."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_RD",
> +        "PublicDescription": "This event counts demand instruction fetch=
es which access the L1 I-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_PRFM",
> +        "PublicDescription": "This event counts instruction fetches gene=
rated by software preload or prefetch instructions which access the L1 I-ca=
che."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_HWPRF",
> +        "PublicDescription": "This event counts instruction fetches whic=
h access the L1 I-cache generated by the hardware prefetcher."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_REFILL_PRFM",
> +        "PublicDescription": "This event counts cache line refills in th=
e L1 I-cache caused by a missed instruction fetch generated by software pre=
load or prefetch instructions. Instruction fetches may include accessing mu=
ltiple instructions, but the single cache line allocation is counted once."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_REFILL_HWPRF",
> +        "PublicDescription": "This event counts each hardware prefetch a=
ccess counted by L1I_CACHE_HWPRF that causes a refill of the Level 1I-cache=
 from outside of the L1 I-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_HIT_RD",
> +        "PublicDescription": "This event counts demand instruction fetch=
es that access the L1 I-cache and hit in the L1 I-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_HIT_RD_FPRF",
> +        "PublicDescription": "This event counts each demand fetch first =
hit counted by L1I_CACHE_HIT_RD where the cache line was fetched in respons=
e to a software preload or by a hardware prefetcher. That is, the L1I_CACHE=
_REFILL_PRF event was generated when the cache line was fetched into the ca=
che.\nOnly the first hit by a demand access is counted. After this event is=
 generated for a cache line, the event is not generated again for the same =
cache line while it remains in the cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_HIT",
> +        "PublicDescription": "This event counts instruction fetches that=
 access the L1 I-cache (demand, hardware prefetch, and software preload acc=
esses) and hit in the L1 I-cache. I-cache accesses caused by cache maintena=
nce operations are not counted."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_CACHE_HIT_PRFM",
> +        "PublicDescription": "This event counts instruction fetches gene=
rated by software preload or prefetch instructions that access the L1 I-cac=
he and hit in the L1 I-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_LFB_HIT_RD",
> +        "PublicDescription": "This event counts demand instruction fetch=
es that access the L1 I-cache and hit in a line that is in the process of b=
eing loaded into the L1 I-cache."
> +    },
> +    {
> +        "EventCode": "0x0174",
> +        "EventName": "L1I_HWPRF_REQ_DROP",
> +        "PublicDescription": "L1 I-cache hardware prefetch dropped."
> +    },
> +    {
> +        "EventCode": "0x01e3",
> +        "EventName": "L1I_CACHE_REFILL_RD",
> +        "PublicDescription": "L1 I-cache refill, Read.\nThis event count=
s demand instruction fetch that causes a refill of the L1 I-cache of this P=
E, from outside of this cache."
> +    },
> +    {
> +        "EventCode": "0x01ea",
> +        "EventName": "L1I_CFC_ENTRIES",
> +        "PublicDescription": "This event counts the CFC (Cache Fill Cont=
rol) entries.\nThe CFC is the fill buffer for I-cache."
> +    },
> +    {
> +        "EventCode": "0x01ef",
> +        "EventName": "L1I_CACHE_INVAL",
> +        "PublicDescription": "L1 I-cache invalidate.\nThis event counts =
each explicit invalidation of a cache line in the L1 I-cache caused by:\n* =
Broadcast cache coherency operations from another CPU in the system.\n* Inv=
alidation dues to capacity eviction in L2 D-cache.\nThis event does not cou=
nt for the following conditions:\n* A cache refill invalidates a cache line=
=2E\n* A CMO which is executed on that CPU Core and invalidates a cache lin=
e specified by Set/Way.\n* Cache Maintenance Operations (CMO) that operate =
by a virtual address.\nNote that\n* CMOs that operate by Set/Way cannot be =
broadcast from one CPU Core to another.\n* The CMO is treated as No-op for =
the purposes of L1 I-cache line invalidation, as this Core implements fully=
 coherent I-cache."
> +    },
> +    {
> +        "EventCode": "0x0212",
> +        "EventName": "L1I_CACHE_HIT_HWPRF",
> +        "PublicDescription": "This event counts each hardware prefetch a=
ccess that hits an L1 I-cache."
> +    },
> +    {
> +        "EventCode": "0x0215",
> +        "EventName": "L1I_LFB_HIT",
> +        "PublicDescription": "L1 Line fill buffer hit.\nThis event count=
s each Demand or software preload or hardware prefetch induced instruction =
fetch that hits an L1 I-cache line that is in the process of being loaded i=
nto the L1 instruction cache, and so does not generate a new refill, but ha=
s to wait for the previous refill to complete."
> +    },
> +    {
> +        "EventCode": "0x0216",
> +        "EventName": "L1I_LFB_HIT_PRFM",
> +        "PublicDescription": "This event counts each software prefetch a=
ccess that hits a cache line that is in the process of being loaded into th=
e L1 instruction cache, and so does not generate a new refill, but has to w=
ait for the previous refill to complete."
> +    },
> +    {
> +        "EventCode": "0x0219",
> +        "EventName": "L1I_LFB_HIT_HWPRF",
> +        "PublicDescription": "This event counts each hardware prefetch a=
ccess that hits a cache line that is in the process of being loaded into th=
e L1 instruction cache, and so does not generate a new refill, but has to w=
ait for the previous refill to complete."
> +    },
> +    {
> +        "EventCode": "0x0221",
> +        "EventName": "L1I_PRFM_REQ",
> +        "PublicDescription": "L1 I-cache software prefetch requests."
> +    },
> +    {
> +        "EventCode": "0x0222",
> +        "EventName": "L1I_HWPRF_REQ",
> +        "PublicDescription": "L1 I-cache hardware prefetch requests."
> +    },
> +    {
> +        "EventCode": "0x0228",
> +        "EventName": "L1I_CACHE_HIT_PRFM_FPRF",
> +        "PublicDescription": "L1 I-cache software prefetch access first =
hit, fetched by hardware or software prefetch.\nThis event counts each soft=
ware preload access first hit where the cache line was fetched in response =
to a hardware prefetcher or software preload instruction.\nOnly the first h=
it is counted. After this event is generated for a cache line, the event is=
 not generated again for the same cache line while it remains in the cache."
> +    },
> +    {
> +        "EventCode": "0x022a",
> +        "EventName": "L1I_CACHE_HIT_HWPRF_FPRF",
> +        "PublicDescription": "L1 I-cache hardware prefetch access first =
hit, fetched by hardware or software prefetch.\nThis event counts each hard=
ware prefetch access first hit where the cache line was fetched in response=
 to a hardware or prefetch instruction.\nOnly the first hit is counted. Aft=
er this event is generated for a cache line, the event is not generated aga=
in for the same cache line while it remains in the cache."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json =
b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json
> new file mode 100644
> index 000000000000..66f21a94381e
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json
> @@ -0,0 +1,134 @@
> +[
> +    {
> +        "ArchStdEvent": "L2D_CACHE",
> +        "PublicDescription": "This event counts accesses to the L2 cache=
 due to data accesses. L2 cache is a unified cache for data and instruction=
 accesses. Accesses are for misses in the L1 D-cache or translation resolut=
ions due to accesses. This event also counts write-back of dirty data from =
L1 D-cache to the L2 cache.\nI-cache accesses are included in this event. T=
his event is the sum of the following events:\nL2D_CACHE_RD,\nL2D_CACHE_WR,=
\nL2D_CACHE_PRFM, and\nL2D_CACHE_HWPRF."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_REFILL",
> +        "PublicDescription": "This event counts cache line refills into =
the L2 cache. L2 cache is a unified cache for data and instruction accesses=
=2E Accesses are for misses in the L1 D-cache or translation resolutions du=
e to accesses.\nI-cache refills are included in this event. This event is t=
he sum of the following events:\nL2D_CACHE_REFILL_RD,\nL2D_CACHE_REFILL_WR,=
\nL2D_CACHE_REFILL_HWPRF, and\nL2D_CACHE_REFILL_PRFM."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_WB",
> +        "PublicDescription": "This event counts write-backs of data from=
 the L2 cache to outside the CPU. This includes snoops to the L2 (from othe=
r CPUs) which return data even if the snoops cause an invalidation. L2 cach=
e line invalidations which do not write data outside the CPU and snoops whi=
ch return data from an L1 cache are not counted. Data would not be written =
outside the cache when invalidating a clean cache line.\nThis event is the =
sum of the following events:\nL2D_CACHE_WB_VICTIM and\nL2D_CACHE_WB_CLEAN."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_RD",
> +        "PublicDescription": "This event counts L2 D-cache accesses due =
to memory Read operations. L2 cache is a unified cache for data and instruc=
tion accesses, accesses are for misses in the L1 D-cache or translation res=
olutions due to accesses.\nI-cache accesses are included in this event. Thi=
s event is a subset of the L2D_CACHE event, but this event only counts memo=
ry Read operations."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_WR",
> +        "PublicDescription": "This event counts L2 cache accesses due to=
 memory Write operations. L2 cache is a unified cache for data and instruct=
ion accesses, accesses are for misses in the L1 D-cache or translation reso=
lutions due to accesses.\nThis event is a subset of the L2D_CACHE event, bu=
t this event only counts memory Write operations."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_REFILL_RD",
> +        "PublicDescription": "This event counts refills for memory acces=
ses due to memory Read operation counted by L2D_CACHE_RD. L2 cache is a uni=
fied cache for data and instruction accesses, accesses are for misses in th=
e L1 D-cache or translation resolutions due to accesses.\nThis CPU includes=
 I-cache refills in this counter as an L2I equivalent event was not impleme=
nted. This event is a subset of the L2D_CACHE_REFILL event. This event does=
 not count L2 refills caused by stashes into L2.\nThis count includes deman=
d requests that encounter an L2 prefetch request or an L2 software prefetch=
 request to the same cache line, which is still pending in the L2 LFB."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_REFILL_WR",
> +        "PublicDescription": "This event counts refills for memory acces=
ses due to memory Write operation counted by L2D_CACHE_WR. L2 cache is a un=
ified cache for data and instruction accesses, accesses are for misses in t=
he L1 D-cache or translation resolutions due to accesses.\nThis count inclu=
des demand requests that encounter an L2 prefetch request or an L2 software=
 prefetch request to the same cache line, which is still pending in the L2 =
LFB."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_WB_VICTIM",
> +        "PublicDescription": "This event counts evictions from the L2 ca=
che because of a line being allocated into the L2 cache.\nThis event is a s=
ubset of the L2D_CACHE_WB event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_WB_CLEAN",
> +        "PublicDescription": "This event counts write-backs from the L2 =
cache that are a result of any of the following:\n* Cache maintenance opera=
tions,\n* Snoop responses, or\n* Direct cache transfers to another CPU due =
to a forwarding snoop request.\nThis event is a subset of the L2D_CACHE_WB =
event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_INVAL",
> +        "PublicDescription": "This event counts each explicit invalidati=
on of a cache line in the L2 cache by cache maintenance operations that ope=
rate by a virtual address, or by external coherency operations. This event =
does not count if either:\n* A cache refill invalidates a cache line, or\n*=
 A cache Maintenance Operation (CMO), which invalidates a cache line specif=
ied by Set/Way,\nis executed on that CPU.\nCMOs that operate by Set/Way can=
not be broadcast from one CPU to another."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_LMISS_RD",
> +        "PublicDescription": "This event counts cache line refills into =
the L2 unified cache from any memory Read operations that incurred addition=
al latency.\nCounts the same as L2D_CACHE_REFILL_RD in this CPU"
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_RW",
> +        "PublicDescription": "This event counts L2 cache demand accesses=
 from any Load/Store operations. L2 cache is a unified cache for data and i=
nstruction accesses, accesses are for misses in the L1 D-cache or translati=
on resolutions due to accesses.\nI-cache accesses are included in this even=
t.\nThis event is the sum of the following events:\nL2D_CACHE_RD and\nL2D_C=
ACHE_WR."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_PRFM",
> +        "PublicDescription": "This event counts L2 D-cache accesses gene=
rated by software preload or prefetch instructions with target =3D L1/L2/L3=
 cache.\nNote that a software preload or prefetch instructions with (target=
 =3D L1/L2/L3) that hits in L1D will not result in an L2 D-cache access. Th=
erefore, such a software preload or prefetch instructions will not be count=
ed by this event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_MISS",
> +        "PublicDescription": "This event counts cache line misses in the=
 L2 cache. L2 cache is a unified cache for data and instruction accesses. A=
ccesses are for misses in the L1 D-cache or translation resolutions due to =
accesses.\nThis event counts the same as L2D_CACHE_REFILL_RD in this CPU."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_REFILL_PRFM",
> +        "PublicDescription": "This event counts refills due to accesses =
generated as a result of software preload or prefetch instructions as count=
ed by L2D_CACHE_PRFM. I-cache refills are included in this event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_HWPRF",
> +        "PublicDescription": "This event counts the L2 D-cache access ca=
used by L1 or L2 hardware prefetcher."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF",
> +        "PublicDescription": "This event counts each hardware prefetch a=
ccess counted by L2D_CACHE_HWPRF that causes a refill of the L2 cache, or a=
ny L1 Data, or Instruction cache of this PE, from outside of those caches.\=
nThis does not include prefetch requests pending waiting for a refill in LF=
B and a new demand request to the same cache line hitting the LFB entry. Al=
l such refills are counted as L2D_LFB_HIT_RWL1PRF_FHWPRF."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_CACHE_REFILL_PRF",
> +        "PublicDescription": "This event counts each access to L2 Cache =
due to a prefetch instruction, or hardware prefetch that causes a refill of=
 the L2 or any Level 1, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x0108",
> +        "EventName": "L2D_CACHE_IF_REFILL",
> +        "PublicDescription": "L2 D-cache refill, instruction fetch.\nThi=
s event counts demand instruction fetch that causes a refill of the L2 cach=
e or L1 cache of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x0109",
> +        "EventName": "L2D_CACHE_TBW_REFILL",
> +        "PublicDescription": "L2 D-cache refill, Page table walk.\nThis =
event counts demand translation table walk that causes a refill of the L2 c=
ache or L1 cache of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x010a",
> +        "EventName": "L2D_CACHE_PF_REFILL",
> +        "PublicDescription": "L2 D-cache refill, prefetch.\nThis event c=
ounts L1 or L2 hardware or software prefetch accesses that causes a refill =
of the L2 cache or L1 cache of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x010b",
> +        "EventName": "L2D_LFB_HIT_RWL1PRF_FHWPRF",
> +        "PublicDescription": "L2 line fill buffer demand Read, demand Wr=
ite or L1 prefetch first hit, fetched by hardware prefetch.\nThis event cou=
nts each of the following access that hit the line-fill buffer when the sam=
e cache line is already being fetched due to an L2 hardware prefetcher.\n* =
Demand Read or Write\n* L1I-HWPRF\n* L1D-HWPRF\n* L1I PRFM\n* L1D PRFM\nThe=
se accesses hit a cache line that is currently being loaded into the L2 cac=
he as a result of a hardware prefetcher to the same line. Consequently, thi=
s access does not initiate a new refill but waits for the completion of the=
 previous refill.\nOnly the first hit is counted. After this event is gener=
ated for a cache line, the event is not generated again for the same cache =
line while it remains in the cache."
> +    },
> +    {
> +        "EventCode": "0x0179",
> +        "EventName": "L2D_CACHE_HIT_RWL1PRF_FHWPRF",
> +        "PublicDescription": "L2 D-cache demand Read, demand Write and L=
1 prefetch hit, fetched by hardware prefetch. This event counts each demand=
 Read, demand Write and L1 hardware or software prefetch request that hit a=
n L2 D-cache line that was refilled into L2 D-cache in response to an L2 ha=
rdware prefetch. Only the first hit is counted. After this event is generat=
ed for a cache line, the event is not generated again for the same cache li=
ne while it remains in the cache."
> +    },
> +    {
> +        "EventCode": "0x01b8",
> +        "EventName": "L2D_CACHE_L1PRF",
> +        "PublicDescription": "L2 D-cache access, L1 hardware or software=
 prefetch. This event counts L1 Hardware or software prefetch access to L2 =
D-cache."
> +    },
> +    {
> +        "EventCode": "0x01b9",
> +        "EventName": "L2D_CACHE_REFILL_L1PRF",
> +        "PublicDescription": "L2 D-cache refill, L1 hardware or software=
 prefetch.\nThis event counts each access counted by L2D_CACHE_L1PRF that c=
auses a refill of the L2 cache or any L1 cache of this PE, from outside of =
those caches."
> +    },
> +    {
> +        "EventCode": "0x0201",
> +        "EventName": "L2D_CACHE_BACKSNOOP_L1D_VIRT_ALIASING",
> +        "PublicDescription": "This event counts when the L2 D-cache send=
s an invalidating back-snoop to the L1 D for an access initiated by the L1 =
D, where the corresponding line is already present in the L1 D-cache.\nThe =
L2 D-cache line tags the PE that refilled the line. It also retains specifi=
c bits of the VA to identify virtually aliased addresses.\nThe L1 D request=
 requiring a back-snoop can originate either from the same PE that refilled=
 the L2 D line or from a different PE. In either case, this event only coun=
ts those back snoop where the requested VA mismatch the VA stored in the L2=
 D tag.\nThis event is counted only by PE that initiated the original reque=
st necessitating a back-snoop.\nNote : The L1 D is VIPT, it identifies this=
 access as a miss. Conversely, as L2 is PIPT, it identifies this as a hit. =
L2 D utilizes the back-snoop mechanism to refill L1 D with the snooped data=
=2E"
> +    },
> +    {
> +        "EventCode": "0x0208",
> +        "EventName": "L2D_CACHE_RWL1PRF",
> +        "PublicDescription": "L2 D-cache access, demand Read, demand Wri=
te or L1 hardware or software prefetch.\nThis event counts each access to L=
2 D-cache due to the following:\n* Demand Read or Write.\n* L1 Hardware or =
software prefetch."
> +    },
> +    {
> +        "EventCode": "0x020a",
> +        "EventName": "L2D_CACHE_REFILL_RWL1PRF",
> +        "PublicDescription": "L2 D-cache refill, demand Read, demand Wri=
te or L1 hardware or software prefetch.\nThis event counts each access coun=
ted by L2D_CACHE_RWL1PRF that causes a refill of the L2 cache, or any L1 ca=
che of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x020c",
> +        "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRFM",
> +        "PublicDescription": "L2 D-cache demand Read, demand Write and L=
1 prefetch hit, fetched by software prefetch.\nThis event counts each deman=
d Read, demand Write and L1 hardware or software prefetch request that hit =
an L2 D-cache line that was refilled into L2 D-cache in response to an L2 s=
oftware prefetch. Only the first hit is counted. After this event is genera=
ted for a cache line, the event is not generated again for the same cache l=
ine while it remains in the cache."
> +    },
> +    {
> +        "EventCode": "0x020e",
> +        "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRF",
> +        "PublicDescription": "L2 D-cache demand Read, demand Write and L=
1 prefetch hit, fetched by software or hardware prefetch.\nThis event count=
s each demand Read, demand Write and L1 hardware or software prefetch reque=
st that hit an L2 D-cache line that was refilled into L2 D-cache in respons=
e to an L2 hardware prefetch or software prefetch. Only the first hit is co=
unted. After this event is generated for a cache line, the event is not gen=
erated again for the same cache line while it remains in the cache."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json b=
/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json
> new file mode 100644
> index 000000000000..851d0a70de9c
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json
> @@ -0,0 +1,107 @@
> +[
> +    {
> +        "ArchStdEvent": "L3D_CACHE_ALLOCATE",
> +        "PublicDescription": "This event counts each memory Write operat=
ion that writes an entire line into the L3 data without fetching data from =
outside the L3 Data. These are allocations of cache lines in the L3 Data th=
at are not refills counted by\nL3D_CACHE_REFILL. For example:\nA Write-back=
 of an entire cache line from an L2 cache to the L3 D-cache.\n* A Write of =
an entire cache line from a coalescing Write buffer.\n* An operation such a=
s DC ZVA.\nThis counter does not count writes that write an entire line to =
beyond level 3. Thus this counter does not count the streaming writes to be=
yond L3 cache."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_REFILL",
> +        "PublicDescription": "This event counts each access counted by L=
3D_CACHE that causes a refill of the L3 Data, or any L1 Data, instruction o=
r L2 cache of this PE, from outside of those caches. This includes the refi=
ll due to hardware prefetch and software prefetch accesses.\nThis event is =
a sum of L3D_CACHE_MISS, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL_HWPRF e=
vent.\nA refill includes any access that causes data to be fetched from out=
side of the L1 to L3 caches, even if the data is ultimately not allocated i=
nto the L3 D-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE",
> +        "PublicDescription": "This event counts each memory Read operati=
on or memory Write operation that causes a cache access to the Level 3.\nTh=
is event is a sum of the following Events:\n* L3D_CACHE_RD(0x00a0)\n* L3D_C=
ACHE_ALLOCATE(0x0029)\n* L3D_CACHE_PRFM(0x8151)\n* L3D_CACHE_HWPRF(0x8156)\=
n* L2D_CACHE_WB(0x0018)"
> +    },
> +    {
> +        "ArchStdEvent": "LL_CACHE_RD",
> +        "PublicDescription": "This is an alias to the event L3D_CACHE_RD=
 (0x00a0)."
> +    },
> +    {
> +        "ArchStdEvent": "LL_CACHE_MISS_RD",
> +        "PublicDescription": "This is an alias to the event L3D_CACHE_RE=
FILL_RD (0x00a2)."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_RD",
> +        "PublicDescription": "This event counts each Memory Read operati=
on to L3 D-cache from instruction fetch, Load/Store, and MMU translation ta=
ble accesses. This does not include hardware prefetcher or PRFM instruction=
 accesses. This include L1 and L2 prefetcher accesses to L3 D-cache."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_REFILL_RD",
> +        "PublicDescription": "This event counts each access counted by b=
oth L3D_CACHE_RD and L3D_CACHE_REFILL. That is, every refill of the L3 cach=
e counted by L3D_CACHE_REFILL that is caused by a Memory Read operation.\nT=
he L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS=
_RD(0x400b) count the same event in the hardware."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_LMISS_RD",
> +        "PublicDescription": "This event counts each memory Read operati=
on to the L3 cache counted by L3D_CACHE that incurs additional latency beca=
use it returns data from outside of the L1 to L3 caches.\nThe L3D_CACHE_MIS=
S(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) coun=
t the same event in the hardware."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_RW",
> +        "PublicDescription": "This event counts each access counted by L=
3D_CACHE that is due to a demand memory Read operation or demand memory Wri=
te operation.\nThis event is a sum of L3D_CACHE_RD(0x00a0), L3D_CACHE_ALLOC=
ATE(0x0029) and L2D_CACHE_WB(0x0018).\nNote that this counter does not coun=
t that writes an entire line to beyond level 3. Thus this counter does not =
count the streaming Writes to beyond L3 cache."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_PRFM",
> +        "PublicDescription": "This event counts each access counted by L=
3D_CACHE that is due to a prefetch instruction. This includes L3 Data acces=
ses due to the L1, L2, or L3 prefetch instruction."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_MISS",
> +        "PublicDescription": "This event counts each demand Read access =
counted by L3D_CACHE_RD that misses in the L1 to L3 Data, causing an access=
 to outside of the L3 cache.\nThe L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_=
RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count the same event in the hard=
ware."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_REFILL_PRFM",
> +        "PublicDescription": "This event counts each access counted by L=
3D_CACHE_PRFM that causes a refill of the L3 cache, or any L1 or L2 Data, f=
rom outside of those caches."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_HWPRF",
> +        "PublicDescription": "This event counts each access to L3 cache =
that is due to a hardware prefetcher. This includes L3D accesses due to the=
 Level-1 or Level-2 or Level-3 hardware prefetcher."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_REFILL_HWPRF",
> +        "PublicDescription": "This event counts each hardware prefetch c=
ounted by L3D_CACHE_HWPRF that causes a refill of the L3 Data or unified ca=
che, or any L1 or L2 Data, Instruction, or unified cache of this PE, from o=
utside of those caches."
> +    },
> +    {
> +        "ArchStdEvent": "L3D_CACHE_REFILL_PRF",
> +        "PublicDescription": "This event counts each access to L3 cache =
due to a prefetch instruction, or hardware prefetch that causes a refill of=
 the L3 Data, or any L1 or L2 Data, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x01e8",
> +        "EventName": "L3D_CACHE_RWL1PRFL2PRF",
> +        "PublicDescription": "L3 cache access, demand Read, demand Write=
, L1 hardware or software prefetch or L2 hardware or software prefetch.\nTh=
is event counts each access to L3 D-cache due to the following:\n* Demand R=
ead or Write.\n* L1 Hardware or software prefetch.\n* L2 Hardware or softwa=
re prefetch."
> +    },
> +    {
> +        "EventCode": "0x01e9",
> +        "EventName": "L3D_CACHE_REFILL_RWL1PRFL2PRF",
> +        "PublicDescription": "L3 cache refill, demand Read, demand Write=
, L1 hardware or software prefetch or L2 hardware or software prefetch.\nTh=
is event counts each access counted by L3D_CACHE_RWL1PRFL2PRF that causes a=
 refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of =
those caches."
> +    },
> +    {
> +        "EventCode": "0x01f6",
> +        "EventName": "L3D_CACHE_REFILL_L2PRF",
> +        "PublicDescription": "This event counts each access counted by L=
3D_CACHE_L2PRF that causes a refill of the L3 cache, or any L1 or L2 cache =
of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x01f7",
> +        "EventName": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF",
> +        "PublicDescription": "L3 cache demand Read, demand Write, L1 pre=
fetch L2 prefetch first hit, fetched by software or hardware prefetch.\nThi=
s event counts each demand Read, demand Write, L1 hardware or software pref=
etch request and L2 hardware or software prefetch that hit an L3 D-cache li=
ne that was refilled into L3 D-cache in response to an L3 hardware prefetch=
 or software prefetch. Only the first hit is counted. After this event is g=
enerated for a cache line, the event is not generated again for the same ca=
che line while it remains in the cache."
> +    },
> +    {
> +        "EventCode": "0x0225",
> +        "EventName": "L3D_CACHE_REFILL_IF",
> +        "PublicDescription": "L3 cache refill, instruction fetch.\nThis =
event counts demand instruction fetch that causes a refill of the L3 cache,=
 or any L1 or L2 cache of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x0226",
> +        "EventName": "L3D_CACHE_REFILL_MM",
> +        "PublicDescription": "L3 cache refill, translation table walk ac=
cess.\nThis event counts demand translation table access that causes a refi=
ll of the L3 cache, or any L1 or L2 cache of this PE, from outside of those=
 caches."
> +    },
> +    {
> +        "EventCode": "0x0227",
> +        "EventName": "L3D_CACHE_REFILL_L1PRF",
> +        "PublicDescription": "This event counts each access counted by L=
3D_CACHE_L1PRF that causes a refill of the L3 cache, or any L1 or L2 cache =
of this PE, from outside of those caches."
> +    },
> +    {
> +        "EventCode": "0x022c",
> +        "EventName": "L3D_CACHE_L1PRF",
> +        "PublicDescription": "This event counts the L3 D-cache access du=
e to L1 hardware prefetch or software prefetch request.\nThe L1 hardware pr=
efetch or software prefetch requests that miss the L1I, L1D and L2 D-cache =
are counted by this counter"
> +    },
> +    {
> +        "EventCode": "0x022d",
> +        "EventName": "L3D_CACHE_L2PRF",
> +        "PublicDescription": "This event counts the L3 D-cache access du=
e to L2 hardware prefetch or software prefetch request.\nThe L2 hardware pr=
efetch or software prefetch requests that miss the L2 D-cache are counted b=
y this counter"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json b/t=
ools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json
> new file mode 100644
> index 000000000000..becd2d90bf39
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json
> @@ -0,0 +1,46 @@
> +[
> +    {
> +        "ArchStdEvent": "MEM_ACCESS",
> +        "PublicDescription": "This event counts memory accesses issued b=
y the CPU load/store unit, where those accesses are issued due to load or s=
tore operations. This event counts memory accesses regardless of whether th=
e data is received from any level of cache hierarchy or external memory. If=
 memory accesses are broken up into smaller transactions than what were spe=
cified in the load or store instructions, then the event counts those small=
er memory transactions.\nMemory accesses generated by the following instruc=
tions or activity are not counted: instruction fetches, cache maintenance i=
nstructions, translation table walks or prefetches, memory prefetch operati=
ons. This event counts the sum of the following events:\nMEM_ACCESS_RD and\=
nMEM_ACCESS_WR."
> +    },
> +    {
> +        "ArchStdEvent": "MEMORY_ERROR",
> +        "PublicDescription": "This event counts any detected correctable=
 or uncorrectable physical memory errors (ECC or parity) in protected CPU R=
AMs. On the Core, this event counts errors in the caches (including data an=
d tag RAMs). Any detected memory error (from either a speculative and aband=
oned access, or an architecturally executed access) is counted.\nNote that =
errors are only detected when the actual protected memory is accessed by an=
 operation."
> +    },
> +    {
> +        "ArchStdEvent": "REMOTE_ACCESS",
> +        "PublicDescription": "This event counts each external bus read a=
ccess that causes an access to a remote device. That is, a socket that does=
 not contain the PE."
> +    },
> +    {
> +        "ArchStdEvent": "MEM_ACCESS_RD",
> +        "PublicDescription": "This event counts memory accesses issued b=
y the CPU due to Load operations. This event counts any memory Load access,=
 no matter whether the data is received from any level of cache hierarchy o=
r external memory. This event also counts atomic Load operations. If memory=
 accesses are broken up by the Load/Store unit into smaller transactions th=
at are issued by the bus interface, then the event counts those smaller tra=
nsactions.\nThe following instructions are not counted:\n1) Instruction fet=
ches,\n2) Cache maintenance instructions,\n3) Translation table walks or pr=
efetches,\n4) Memory prefetch operations.\nThis event is a subset of the ME=
M_ACCESS event but the event only counts memory-Read operations."
> +    },
> +    {
> +        "ArchStdEvent": "MEM_ACCESS_WR",
> +        "PublicDescription": "This event counts memory accesses issued b=
y the CPU due to Store operations. This event counts any memory Store acces=
s, no matter whether the data is located in any level of cache or external =
memory. This event also counts atomic Load and Store operations. If memory =
accesses are broken up by the Load/Store unit into smaller transactions tha=
t are issued by the bus interface, then the event counts those smaller tran=
sactions."
> +    },
> +    {
> +        "ArchStdEvent": "LDST_ALIGN_LAT",
> +        "PublicDescription": "This event counts the number of memory Rea=
d and Write accesses in a cycle that incurred additional latency due to the=
 alignment of the address and the size of data being accessed, which result=
s in a store crossing a single cache line.\nThis event is implemented as th=
e sum of the following events on this CPU:\nLD_ALIGN_LAT and\nST_ALIGN_LAT."
> +    },
> +    {
> +        "ArchStdEvent": "LD_ALIGN_LAT",
> +        "PublicDescription": "This event counts the number of memory Rea=
d accesses in a cycle that incurred additional latency due to the alignment=
 of the address and size of data being accessed, which results in a load cr=
ossing a single cache line."
> +    },
> +    {
> +        "ArchStdEvent": "ST_ALIGN_LAT",
> +        "PublicDescription": "This event counts the number of memory Wri=
te accesses in a cycle that incurred additional latency due to the alignmen=
t of the address and size of data being accessed."
> +    },
> +    {
> +        "ArchStdEvent": "INST_FETCH_PERCYC",
> +        "PublicDescription": "This event counts number of instruction fe=
tches outstanding per cycle, which will provide an average latency of instr=
uction fetch."
> +    },
> +    {
> +        "ArchStdEvent": "MEM_ACCESS_RD_PERCYC",
> +        "PublicDescription": "This event counts the number of outstandin=
g Loads or memory Read accesses per cycle."
> +    },
> +    {
> +        "ArchStdEvent": "INST_FETCH",
> +        "PublicDescription": "This event counts instruction memory acces=
ses that the PE makes."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json b/=
tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
> new file mode 100644
> index 000000000000..b825ede03f54
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
> @@ -0,0 +1,722 @@
> +[
> +    {
> +        "MetricName": "backend_bound",
> +        "MetricExpr": "100 * (STALL_SLOT_BACKEND / CPU_SLOT)",
> +        "BriefDescription": "This metric is the percentage of total slot=
s that were stalled due to resource constraints in the backend of the proce=
ssor.",
> +        "ScaleUnit": "1percent of slots",
> +        "MetricGroup": "TopdownL1"
> +    },
> +    {
> +        "MetricName": "backend_busy_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_BUSY / STALL_BACKEND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to issue queues being full to accept operatio=
ns for execution.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_cache_l1d_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_L1D / (STALL_BACKEND_L1D + S=
TALL_BACKEND_MEM))",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to memory access latency issues caused by L1 =
D-cache misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_cache_l2d_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_MEM / (STALL_BACKEND_L1D + S=
TALL_BACKEND_MEM))",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to memory access latency issues caused by L2 =
D-cache misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_core_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_CPUBOUND / STALL_BACKEND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to backend Core resource constraints not rela=
ted to instruction fetch latency issues caused by memory access components.=
",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_core_rename_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_RENAME / STALL_BACKEND_CPUBO=
UND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend as the rename unit registers are unavailable.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_mem_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_MEMBOUND / STALL_BACKEND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to backend Core resource constraints related =
to memory access latency issues caused by memory access components.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_mem_cache_bound",
> +        "MetricExpr": "100 * ((STALL_BACKEND_L1D + STALL_BACKEND_MEM) / =
STALL_BACKEND_MEMBOUND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to memory latency issues caused by D-cache mi=
sses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_mem_store_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND)=
",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to memory Write pending caused by Stores stal=
led in the pre-commit stage.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_mem_tlb_bound",
> +        "MetricExpr": "100 * (STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND=
)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the backend due to memory access latency issues caused by Dat=
a TLB misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Backend"
> +    },
> +    {
> +        "MetricName": "backend_stalled_cycles",
> +        "MetricExpr": "100 * (STALL_BACKEND / CPU_CYCLES)",
> +        "BriefDescription": "This metric is the percentage of cycles tha=
t were stalled due to resource constraints in the backend unit of the proce=
ssor.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Cycle_Accounting"
> +    },
> +    {
> +        "MetricName": "bad_speculation",
> +        "MetricExpr": "100 - (frontend_bound + retiring + backend_bound)=
",
> +        "BriefDescription": "This metric is the percentage of total slot=
s that executed operations and didn't retire due to a pipeline flush. This =
indicates cycles that were utilized but inefficiently.",
> +        "ScaleUnit": "1percent of slots",
> +        "MetricGroup": "TopdownL1"
> +    },
> +    {
> +        "MetricName": "barrier_percentage",
> +        "MetricExpr": "100 * ((ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SP=
EC)",
> +        "BriefDescription": "This metric measures instruction and data b=
arrier operations as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "branch_direct_ratio",
> +        "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED",
> +        "BriefDescription": "This metric measures the ratio of direct br=
anches retired to the total number of branches architecturally executed.",
> +        "ScaleUnit": "1per branch",
> +        "MetricGroup": "Branch_Effectiveness"
> +    },
> +    {
> +        "MetricName": "branch_indirect_ratio",
> +        "MetricExpr": "BR_IND_RETIRED / BR_RETIRED",
> +        "BriefDescription": "This metric measures the ratio of indirect =
branches retired, including function returns, to the total number of branch=
es architecturally executed.",
> +        "ScaleUnit": "1per branch",
> +        "MetricGroup": "Branch_Effectiveness"
> +    },
> +    {
> +        "MetricName": "branch_misprediction_ratio",
> +        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
> +        "BriefDescription": "This metric measures the ratio of branches =
mispredicted to the total number of branches architecturally executed. This=
 gives an indication of the effectiveness of the branch prediction unit.",
> +        "ScaleUnit": "1per branch",
> +        "MetricGroup": "Miss_Ratio;Branch_Effectiveness"
> +    },
> +    {
> +        "MetricName": "branch_mpki",
> +        "MetricExpr": "1000 * (BR_MIS_PRED_RETIRED / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of branch m=
ispredictions per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;Branch_Effectiveness"
> +    },
> +    {
> +        "MetricName": "branch_percentage",
> +        "MetricExpr": "100 * ((BR_IMMED_SPEC + BR_INDIRECT_SPEC) / INST_=
SPEC)",
> +        "BriefDescription": "This metric measures branch operations as a=
 percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "branch_return_ratio",
> +        "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED",
> +        "BriefDescription": "This metric measures the ratio of branches =
retired that are function returns to the total number of branches architect=
urally executed.",
> +        "ScaleUnit": "1per branch",
> +        "MetricGroup": "Branch_Effectiveness"
> +    },
> +    {
> +        "MetricName": "bus_bandwidth",
> +        "MetricExpr": "BUS_ACCESS * 32 / duration_time ",
> +        "BriefDescription": "This metric measures the bus-bandwidth of t=
he data transferred between this PE's L2 with unCore in the system.",
> +        "ScaleUnit": "1Bytes/sec"
> +    },
> +    {
> +        "MetricName": "cpu_cycles_fraction_in_st_mode",
> +        "MetricExpr": "((CPU_SLOT/CPU_CYCLES) - 5) / 5",
> +        "BriefDescription": "This metric counts fraction of the CPU cycl=
es spent in ST mode during program execution.",
> +        "ScaleUnit": "1fraction of cycles",
> +        "MetricGroup": "SMT"
> +    },
> +    {
> +        "MetricName": "cpu_cycles_in_smt_mode",
> +        "MetricExpr": "(1 - cpu_cycles_fraction_in_st_mode) * CPU_CYCLES=
",
> +        "BriefDescription": "This metric counts CPU cycles in SMT mode d=
uring program execution.",
> +        "ScaleUnit": "1CPU cycles",
> +        "MetricGroup": "SMT"
> +    },
> +    {
> +        "MetricName": "cpu_cycles_in_st_mode",
> +        "MetricExpr": "cpu_cycles_fraction_in_st_mode * CPU_CYCLES",
> +        "BriefDescription": "This metric counts CPU cycles in ST mode du=
ring program execution.",
> +        "ScaleUnit": "1CPU cycles",
> +        "MetricGroup": "SMT"
> +    },
> +    {
> +        "MetricName": "crypto_percentage",
> +        "MetricExpr": "100 * (CRYPTO_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures crypto operations as a=
 percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "dtlb_mpki",
> +        "MetricExpr": "1000 * (DTLB_WALK / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of Data TLB=
 Walks per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;DTLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "dtlb_walk_average_latency",
> +        "MetricExpr": "DTLB_WALK_PERCYC / DTLB_WALK",
> +        "BriefDescription": "This metric measures the average latency of=
 Data TLB walks in CPU cycles.",
> +        "ScaleUnit": "1CPU cycles",
> +        "MetricGroup": "Average_Latency"
> +    },
> +    {
> +        "MetricName": "dtlb_walk_ratio",
> +        "MetricExpr": "DTLB_WALK / L1D_TLB",
> +        "BriefDescription": "This metric measures the ratio of Data TLB =
Walks to the total number of Data TLB accesses. This gives an indication of=
 the effectiveness of the Data TLB accesses.",
> +        "ScaleUnit": "1per TLB access",
> +        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "fp16_percentage",
> +        "MetricExpr": "100 * (FP_HP_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures half-precision floatin=
g point operations as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "FP_Precision_Mix"
> +    },
> +    {
> +        "MetricName": "fp32_percentage",
> +        "MetricExpr": "100 * (FP_SP_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures single-precision float=
ing point operations as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "FP_Precision_Mix"
> +    },
> +    {
> +        "MetricName": "fp64_percentage",
> +        "MetricExpr": "100 * (FP_DP_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures double-precision float=
ing point operations as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "FP_Precision_Mix"
> +    },
> +    {
> +        "MetricName": "fp_ops_per_cycle",
> +        "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYC=
LES",
> +        "BriefDescription": "This metric measures floating point operati=
ons per cycle in any precision performed by any instruction. Operations are=
 counted by computation and by vector lanes, fused computations such as mul=
tiply-add count as twice per vector lane for example.",
> +        "ScaleUnit": "1operations per cycle",
> +        "MetricGroup": "FP_Arithmetic_Intensity"
> +    },
> +    {
> +        "MetricName": "frontend_bound",
> +        "MetricExpr": "100 * (STALL_SLOT_FRONTEND_WITHOUT_MISPRED / CPU_=
SLOT)",
> +        "BriefDescription": "This metric is the percentage of total slot=
s that were stalled due to resource constraints in the frontend of the proc=
essor.",
> +        "ScaleUnit": "1percent of slots",
> +        "MetricGroup": "TopdownL1"
> +    },
> +    {
> +        "MetricName": "frontend_cache_l1i_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I +=
 STALL_FRONTEND_MEM))",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend due to memory access latency issues caused by L1=
 I-cache misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_cache_l2i_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I +=
 STALL_FRONTEND_MEM))",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend due to memory access latency issues caused by L2=
 I-cache misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_core_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_CPUBOUND / STALL_FRONTEND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend due to frontend Core resource constraints not re=
lated to instruction fetch latency issues caused by memory access component=
s.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_core_flow_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_FLOW / STALL_FRONTEND_CPUBO=
UND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend as the decode unit is awaiting input from the br=
anch prediction unit.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_core_flush_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUB=
OUND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend as the processor is recovering from a pipeline f=
lush caused by bad speculation or other machine resteers.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_mem_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_MEMBOUND / STALL_FRONTEND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend due to frontend Core resource constraints relate=
d to the instruction fetch latency issues caused by memory access component=
s.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_mem_cache_bound",
> +        "MetricExpr": "100 * ((STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) =
/ STALL_FRONTEND_MEMBOUND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend due to instruction fetch latency issues caused b=
y I-cache misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_mem_tlb_bound",
> +        "MetricExpr": "100 * (STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOU=
ND)",
> +        "BriefDescription": "This metric is the percentage of total cycl=
es stalled in the frontend due to instruction fetch latency issues caused b=
y Instruction TLB misses.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Topdown_Frontend"
> +    },
> +    {
> +        "MetricName": "frontend_stalled_cycles",
> +        "MetricExpr": "100 * (STALL_FRONTEND / CPU_CYCLES)",
> +        "BriefDescription": "This metric is the percentage of cycles tha=
t were stalled due to resource constraints in the frontend unit of the proc=
essor.",
> +        "ScaleUnit": "1percent of cycles",
> +        "MetricGroup": "Cycle_Accounting"
> +    },
> +    {
> +        "MetricName": "instruction_fetch_average_latency",
> +        "MetricExpr": "INST_FETCH_PERCYC / INST_FETCH",
> +        "BriefDescription": "This metric measures the average latency of=
 instruction fetches in CPU cycles.",
> +        "ScaleUnit": "1CPU cycles",
> +        "MetricGroup": "Average_Latency"
> +    },
> +    {
> +        "MetricName": "integer_dp_percentage",
> +        "MetricExpr": "100 * (DP_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures scalar integer operati=
ons as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "ipc",
> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> +        "BriefDescription": "This metric measures the number of instruct=
ions retired per cycle.",
> +        "ScaleUnit": "1per cycle",
> +        "MetricGroup": "General"
> +    },
> +    {
> +        "MetricName": "itlb_mpki",
> +        "MetricExpr": "1000 * (ITLB_WALK / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of instruct=
ion TLB Walks per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;ITLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "itlb_walk_average_latency",
> +        "MetricExpr": "ITLB_WALK_PERCYC / ITLB_WALK",
> +        "BriefDescription": "This metric measures the average latency of=
 instruction TLB walks in CPU cycles.",
> +        "ScaleUnit": "1CPU cycles",
> +        "MetricGroup": "Average_Latency"
> +    },
> +    {
> +        "MetricName": "itlb_walk_ratio",
> +        "MetricExpr": "ITLB_WALK / L1I_TLB",
> +        "BriefDescription": "This metric measures the ratio of instructi=
on TLB Walks to the total number of Instruction TLB accesses. This gives an=
 indication of the effectiveness of the Instruction TLB accesses.",
> +        "ScaleUnit": "1per TLB access",
> +        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_cache_miss_ratio",
> +        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
> +        "BriefDescription": "This metric measures the ratio of L1 D-cach=
e accesses missed to the total number of L1 D-cache accesses. This gives an=
 indication of the effectiveness of the L1 D-cache.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_cache_mpki",
> +        "MetricExpr": "1000 * (L1D_CACHE_REFILL / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of L1 D-cac=
he accesses missed per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;L1D_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_cache_rw_miss_ratio",
> +        "MetricExpr": "l1d_demand_misses / l1d_demand_accesses",
> +        "BriefDescription": "This metric measures the ratio of L1 D-cach=
e Read accesses missed to the total number of L1 D-cache accesses. This giv=
es an indication of the effectiveness of the L1 D-cache for demand Load or =
Store traffic.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_demand_accesses",
> +        "MetricExpr": "L1D_CACHE_RW",
> +        "BriefDescription": "This metric measures the count of L1 D-cach=
e accesses incurred on Load or Store by the instruction stream of the progr=
am.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_demand_misses",
> +        "MetricExpr": "L1D_CACHE_REFILL_RW",
> +        "BriefDescription": "This metric measures the count of L1 D-cach=
e misses incurred on a Load or Store by the instruction stream of the progr=
am.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_prf_accuracy",
> +        "MetricExpr": "100 * (l1d_useful_prf / l1d_refilled_prf)",
> +        "BriefDescription": "This metric measures the fraction of prefet=
ched memory addresses that are used by the instruction stream.",
> +        "ScaleUnit": "1percent of prefetch",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_prf_coverage",
> +        "MetricExpr": "100 * (l1d_useful_prf / (l1d_demand_misses + l1d_=
refilled_prf))",
> +        "BriefDescription": "This metric measures the baseline demand ca=
che misses which the prefetcher brings into the cache.",
> +        "ScaleUnit": "1percent of cache access",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_refilled_prf",
> +        "MetricExpr": "L1D_CACHE_REFILL_HWPRF + L1D_CACHE_REFILL_PRFM + =
L1D_LFB_HIT_RW_FHWPRF + L1D_LFB_HIT_RW_FPRFM",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L1 data prefetcher (hardware prefetches or software preload)=
 into L1 D-cache.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_tlb_miss_ratio",
> +        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
> +        "BriefDescription": "This metric measures the ratio of L1 Data T=
LB accesses missed to the total number of L1 Data TLB accesses. This gives =
an indication of the effectiveness of the L1 Data TLB.",
> +        "ScaleUnit": "1per TLB access",
> +        "MetricGroup": "Miss_Ratio;DTLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_tlb_mpki",
> +        "MetricExpr": "1000 * (L1D_TLB_REFILL / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of L1 Data =
TLB accesses missed per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;DTLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1d_useful_prf",
> +        "MetricExpr": "L1D_CACHE_HIT_RW_FPRF + L1D_LFB_HIT_RW_FHWPRF + L=
1D_LFB_HIT_RW_FPRFM",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L1 data prefetcher (hardware prefetches or software preload)=
 into L1 D-cache which are further used by Load or Store from the instructi=
on stream of the program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1I_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_cache_miss_ratio",
> +        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
> +        "BriefDescription": "This metric measures the ratio of L1 I-cach=
e accesses missed to the total number of L1 I-cache accesses. This gives an=
 indication of the effectiveness of the L1 I-cache.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_cache_mpki",
> +        "MetricExpr": "1000 * (L1I_CACHE_REFILL / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of L1 I-cac=
he accesses missed per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;L1I_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_cache_rd_miss_ratio",
> +        "MetricExpr": "l1i_demand_misses / l1i_demand_accesses",
> +        "BriefDescription": "This metric measures the ratio of L1 I-cach=
e Read accesses missed to the total number of L1 I-cache accesses. This giv=
es an indication of the effectiveness of the L1 I-cache for demand instruct=
ion fetch traffic. Note that cache accesses in this cache are demand instru=
ction fetch.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_demand_accesses",
> +        "MetricExpr": "L1I_CACHE_RD",
> +        "BriefDescription": "This metric measures the count of L1 I-cach=
e accesses caused by an instruction fetch by the instruction stream of the =
program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_demand_misses",
> +        "MetricExpr": "L1I_CACHE_REFILL_RD",
> +        "BriefDescription": "This metric measures the count of L1 I-cach=
e misses caused by an instruction fetch by the instruction stream of the pr=
ogram.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_prf_accuracy",
> +        "MetricExpr": "100 * (l1i_useful_prf / l1i_refilled_prf)",
> +        "BriefDescription": "This metric measures the fraction of prefet=
ched memory addresses that are used by the instruction stream.",
> +        "ScaleUnit": "1percent of prefetch",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_prf_coverage",
> +        "MetricExpr": "100 * (l1i_useful_prf / (l1i_demand_misses + l1i_=
refilled_prf))",
> +        "BriefDescription": "This metric measures the baseline demand ca=
che misses which the prefetcher brings into the cache.",
> +        "ScaleUnit": "1percent of cache access",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_refilled_prf",
> +        "MetricExpr": "L1I_CACHE_REFILL_HWPRF + L1I_CACHE_REFILL_PRFM",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L1 instruction prefetcher (hardware prefetches or software p=
reload) into L1 I-cache.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_tlb_miss_ratio",
> +        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
> +        "BriefDescription": "This metric measures the ratio of L1 Instru=
ction TLB accesses missed to the total number of L1 Instruction TLB accesse=
s. This gives an indication of the effectiveness of the L1 Instruction TLB.=
",
> +        "ScaleUnit": "1per TLB access",
> +        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_tlb_mpki",
> +        "MetricExpr": "1000 * (L1I_TLB_REFILL / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of L1 Instr=
uction TLB accesses missed per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;ITLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l1i_useful_prf",
> +        "MetricExpr": "L1I_CACHE_HIT_RD_FPRF",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L1 instruction prefetcher (hardware prefetches or software p=
reload) into L1 I-cache which are further used by instruction stream of the=
 program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L1D_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2_cache_miss_ratio",
> +        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
> +        "BriefDescription": "This metric measures the ratio of L2 cache =
accesses missed to the total number of L2 cache accesses. This gives an ind=
ication of the effectiveness of the L2 cache, which is a unified cache that=
 stores both data and instruction.\nNote that cache accesses in this cache =
are either data memory access or instruction fetch as this is a unified cac=
he.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2_cache_mpki",
> +        "MetricExpr": "1000 * (l2d_demand_misses / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of L2 unifi=
ed cache accesses missed per thousand instructions executed.\nNote that cac=
he accesses in this cache are either data memory access or instruction fetc=
h as this is a unified cache.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;L2_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2_tlb_miss_ratio",
> +        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
> +        "BriefDescription": "This metric measures the ratio of L2 unifie=
d TLB accesses missed to the total number of L2 unified TLB accesses.\nThis=
 gives an indication of the effectiveness of the L2 TLB.",
> +        "ScaleUnit": "1per TLB access",
> +        "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2_tlb_mpki",
> +        "MetricExpr": "1000 * (L2D_TLB_REFILL / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of L2 unifi=
ed TLB accesses missed per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_cache_rwl1prf_miss_ratio",
> +        "MetricExpr": "l2d_demand_misses / l2d_demand_accesses",
> +        "BriefDescription": "This metric measures the ratio of L2 D-cach=
e Read accesses missed to the total number of L2 D-cache accesses.\nThis gi=
ves an indication of the effectiveness of the L2 D-cache for demand instruc=
tion fetch, Load, Store, or L1 prefetcher accesses traffic.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_demand_accesses",
> +        "MetricExpr": "L2D_CACHE_RD + L2D_CACHE_WR + L2D_CACHE_L1PRF",
> +        "BriefDescription": "This metric measures the count of L2 D-cach=
e accesses incurred on an instruction fetch, Load, Store, or L1 prefetcher =
accesses by the instruction stream of the program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_demand_misses",
> +        "MetricExpr": "L2D_CACHE_REFILL_RD + L2D_CACHE_REFILL_WR + L2D_C=
ACHE_REFILL_L1PRF",
> +        "BriefDescription": "This metric measures the count of L2 D-cach=
e misses incurred on an instruction fetch, Load, Store, or L1 prefetcher ac=
cesses by the instruction stream of the program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_prf_accuracy",
> +        "MetricExpr": "100 * (l2d_useful_prf / l2d_refilled_prf)",
> +        "BriefDescription": "This metric measures the fraction of prefet=
ched memory addresses that are used by the instruction stream.",
> +        "ScaleUnit": "1percent of prefetch",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_prf_coverage",
> +        "MetricExpr": "100 * (l2d_useful_prf / (l2d_demand_misses + l2d_=
refilled_prf))",
> +        "BriefDescription": "This metric measures the baseline demand ca=
che misses which the prefetcher brings into the cache.",
> +        "ScaleUnit": "1percent of cache access",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_refilled_prf",
> +        "MetricExpr": "(L2D_CACHE_REFILL_PRF - L2D_CACHE_REFILL_L1PRF) +=
 L2D_LFB_HIT_RWL1PRF_FHWPRF",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L2 data prefetcher (hardware prefetches or software preload)=
 into L2 D-cache.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l2d_useful_prf",
> +        "MetricExpr": "L2D_CACHE_HIT_RWL1PRF_FPRF + L2D_LFB_HIT_RWL1PRF_=
FHWPRF",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L2 data prefetcher (hardware prefetches or software preload)=
 into L2 D-cache which are further used by instruction fetch, Load, Store, =
or L1 prefetcher accesses from the instruction stream of the program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L2_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_cache_rwl1prfl2prf_miss_ratio",
> +        "MetricExpr": "l3d_demand_misses / l3d_demand_accesses",
> +        "BriefDescription": "This metric measures the ratio of L3 D-cach=
e Read accesses missed to the total number of L3 D-cache accesses. This giv=
es an indication of the effectiveness of the L2 D-cache for demand instruct=
ion fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses traffic.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_demand_accesses",
> +        "MetricExpr": "L3D_CACHE_RWL1PRFL2PRF",
> +        "BriefDescription": "This metric measures the count of L3 D-cach=
e accesses incurred on an instruction fetch, Load, Store, L1 prefetcher, or=
 L2 prefetcher accesses by the instruction stream of the program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_demand_misses",
> +        "MetricExpr": "L3D_CACHE_REFILL_RWL1PRFL2PRF",
> +        "BriefDescription": "This metric measures the count of L3 D-cach=
e misses incurred on an instruction fetch, Load, Store, L1 prefetcher, or L=
2 prefetcher accesses by the instruction stream of the program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_prf_accuracy",
> +        "MetricExpr": "100 * (l3d_useful_prf / l3d_refilled_prf)",
> +        "BriefDescription": "This metric measures the fraction of prefet=
ched memory addresses that are used by the instruction stream.",
> +        "ScaleUnit": "1percent of prefetch",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_prf_coverage",
> +        "MetricExpr": "100 * (l3d_useful_prf / (l3d_demand_misses + l3d_=
refilled_prf))",
> +        "BriefDescription": "This metric measures the baseline demand ca=
che misses which the prefetcher brings into the cache.",
> +        "ScaleUnit": "1percent of cache access",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_refilled_prf",
> +        "MetricExpr": "L3D_CACHE_REFILL_HWPRF + L3D_CACHE_REFILL_PRFM - =
L3D_CACHE_REFILL_L1PRF - L3D_CACHE_REFILL_L2PRF",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L3 data prefetcher (hardware prefetches or software preload)=
 into L3 D-cache.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "l3d_useful_prf",
> +        "MetricExpr": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF",
> +        "BriefDescription": "This metric measures the count of cache lin=
es refilled by L3 data prefetcher (hardware prefetches or software preload)=
 into L3 D-cache which are further used by instruction fetch, Load, Store, =
L1 prefetcher, or L2 prefetcher accesses from the instruction stream of the=
 program.",
> +        "ScaleUnit": "1count",
> +        "MetricGroup": "L3_Prefetcher_Effectiveness"
> +    },
> +    {
> +        "MetricName": "ll_cache_read_hit_ratio",
> +        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
> +        "BriefDescription": "This metric measures the ratio of last leve=
l cache Read accesses hit in the cache to the total number of last level ca=
che accesses. This gives an indication of the effectiveness of the last lev=
el cache for Read traffic. Note that cache accesses in this cache are eithe=
r data memory access or instruction fetch as this is a system level cache.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "LL_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "ll_cache_read_miss_ratio",
> +        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
> +        "BriefDescription": "This metric measures the ratio of last leve=
l cache Read accesses missed to the total number of last level cache access=
es. This gives an indication of the effectiveness of the last level cache f=
or Read traffic. Note that cache accesses in this cache are either data mem=
ory access or instruction fetch as this is a system level cache.",
> +        "ScaleUnit": "1per cache access",
> +        "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "ll_cache_read_mpki",
> +        "MetricExpr": "1000 * (LL_CACHE_MISS_RD / INST_RETIRED)",
> +        "BriefDescription": "This metric measures the number of last lev=
el cache Read accesses missed per thousand instructions executed.",
> +        "ScaleUnit": "1MPKI",
> +        "MetricGroup": "MPKI;LL_Cache_Effectiveness"
> +    },
> +    {
> +        "MetricName": "load_average_latency",
> +        "MetricExpr": "MEM_ACCESS_RD_PERCYC / MEM_ACCESS",
> +        "BriefDescription": "This metric measures the average latency of=
 Load operations in CPU cycles.",
> +        "ScaleUnit": "1CPU cycles",
> +        "MetricGroup": "Average_Latency"
> +    },
> +    {
> +        "MetricName": "load_percentage",
> +        "MetricExpr": "100 * (LD_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures Load operations as a p=
ercentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "nonsve_fp_ops_per_cycle",
> +        "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES",
> +        "BriefDescription": "This metric measures floating point operati=
ons per cycle in any precision performed by an instruction that is not an S=
VE instruction. Operations are counted by computation and by vector lanes, =
fused computations such as multiply-add count as twice per vector lane for =
example.",
> +        "ScaleUnit": "1operations per cycle",
> +        "MetricGroup": "FP_Arithmetic_Intensity"
> +    },
> +    {
> +        "MetricName": "retiring",
> +        "MetricExpr": "100 * ((OP_RETIRED/OP_SPEC) * (1 - (STALL_SLOT/CP=
U_SLOT)))",
> +        "BriefDescription": "This metric is the percentage of total slot=
s that retired operations, which indicates cycles that were utilized effici=
ently.",
> +        "ScaleUnit": "1percent of slots",
> +        "MetricGroup": "TopdownL1"
> +    },
> +    {
> +        "MetricName": "scalar_fp_percentage",
> +        "MetricExpr": "100 * (VFP_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures scalar floating point =
operations as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "simd_percentage",
> +        "MetricExpr": "100 * (ASE_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures advanced SIMD operatio=
ns as a percentage of total operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "store_percentage",
> +        "MetricExpr": "100 * (ST_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures Store operations as a =
percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "sve_all_percentage",
> +        "MetricExpr": "100 * (SVE_INST_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures scalable vector operat=
ions, including Loads and Stores, as a percentage of operations speculative=
ly executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "Operation_Mix"
> +    },
> +    {
> +        "MetricName": "sve_fp_ops_per_cycle",
> +        "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES",
> +        "BriefDescription": "This metric measures floating point operati=
ons per cycle in any precision performed by SVE instructions. Operations ar=
e counted by computation and by vector lanes, fused computations such as mu=
ltiply-add count as twice per vector lane for example.",
> +        "ScaleUnit": "1operations per cycle",
> +        "MetricGroup": "FP_Arithmetic_Intensity"
> +    },
> +    {
> +        "MetricName": "sve_predicate_empty_percentage",
> +        "MetricExpr": "100 * (SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC)",
> +        "BriefDescription": "This metric measures scalable vector operat=
ions with no active predicates as a percentage of SVE predicated operations=
 speculatively executed.",
> +        "ScaleUnit": "1percent of SVE predicated operations",
> +        "MetricGroup": "SVE_Effectiveness"
> +    },
> +    {
> +        "MetricName": "sve_predicate_full_percentage",
> +        "MetricExpr": "100 * (SVE_PRED_FULL_SPEC / SVE_PRED_SPEC)",
> +        "BriefDescription": "This metric measures scalable vector operat=
ions with all active predicates as a percentage of SVE predicated operation=
s speculatively executed.",
> +        "ScaleUnit": "1percent of SVE predicated operations",
> +        "MetricGroup": "SVE_Effectiveness"
> +    },
> +    {
> +        "MetricName": "sve_predicate_partial_percentage",
> +        "MetricExpr": "100 * (SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC)",
> +        "BriefDescription": "This metric measures scalable vector operat=
ions with at least one active predicates as a percentage of SVE predicated =
operations speculatively executed.",
> +        "ScaleUnit": "1percent of SVE predicated operations",
> +        "MetricGroup": "SVE_Effectiveness"
> +    },
> +    {
> +        "MetricName": "sve_predicate_percentage",
> +        "MetricExpr": "100 * (SVE_PRED_SPEC / INST_SPEC)",
> +        "BriefDescription": "This metric measures scalable vector operat=
ions with predicates as a percentage of operations speculatively executed.",
> +        "ScaleUnit": "1percent of operations",
> +        "MetricGroup": "SVE_Effectiveness"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json b/too=
ls/perf/pmu-events/arch/arm64/nvidia/t410/misc.json
> new file mode 100644
> index 000000000000..8ff87d844e52
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json
> @@ -0,0 +1,642 @@
> +[
> +    {
> +        "ArchStdEvent": "SW_INCR",
> +        "PublicDescription": "This event counts software writes to the P=
MSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a=
 manually updated counter for use by application software.\nThis event coul=
d be used to measure any user program event, such as accesses to a particul=
ar data structure (by writing to the PMSWINC_EL0 register each time the dat=
a structure is accessed).\nTo use the PMSWINC_EL0 register and event, devel=
opers must insert instructions that write to the PMSWINC_EL0 register into =
the source code.\nSince the SW_INCR event records writes to the PMSWINC_EL0=
 register, there is no need to do a Read/Increment/Write sequence to the PM=
SWINC_EL0 register."
> +    },
> +    {
> +        "ArchStdEvent": "TRB_WRAP",
> +        "PublicDescription": "This event is generated each time the trac=
e buffer current Write pointer is wrapped to the trace buffer base pointer."
> +    },
> +    {
> +        "ArchStdEvent": "TRCEXTOUT0",
> +        "PublicDescription": "Trace unit external output 0."
> +    },
> +    {
> +        "ArchStdEvent": "TRCEXTOUT1",
> +        "PublicDescription": "Trace unit external output 1."
> +    },
> +    {
> +        "ArchStdEvent": "TRCEXTOUT2",
> +        "PublicDescription": "Trace unit external output 2."
> +    },
> +    {
> +        "ArchStdEvent": "TRCEXTOUT3",
> +        "PublicDescription": "Trace unit external output 3."
> +    },
> +    {
> +        "ArchStdEvent": "CTI_TRIGOUT4",
> +        "PublicDescription": "Cross-trigger Interface output trigger 4."
> +    },
> +    {
> +        "ArchStdEvent": "CTI_TRIGOUT5",
> +        "PublicDescription": "Cross-trigger Interface output trigger 5."
> +    },
> +    {
> +        "ArchStdEvent": "CTI_TRIGOUT6",
> +        "PublicDescription": "Cross-trigger Interface output trigger 6."
> +    },
> +    {
> +        "ArchStdEvent": "CTI_TRIGOUT7",
> +        "PublicDescription": "Cross-trigger Interface output trigger 7."
> +    },
> +    {
> +        "EventCode": "0x00e1",
> +        "EventName": "L1I_PRFM_REQ_DROP",
> +        "PublicDescription": "L1 I-cache software prefetch dropped."
> +    },
> +    {
> +        "EventCode": "0x0100",
> +        "EventName": "L1_PF_REFILL",
> +        "PublicDescription": "L1 prefetch requests, refilled to L1 cache=
=2E"
> +    },
> +    {
> +        "EventCode": "0x0120",
> +        "EventName": "FLUSH",
> +        "PublicDescription": "This event counts both the CT flush and BX=
 flush. The BR_MIS_PRED counts the BX flushes. So the FLUSH-BR_MIS_PRED giv=
es the CT flushes."
> +    },
> +    {
> +        "EventCode": "0x0121",
> +        "EventName": "FLUSH_MEM",
> +        "PublicDescription": "Flushes due to memory hazards. This only i=
ncludes CT flushes."
> +    },
> +    {
> +        "EventCode": "0x0122",
> +        "EventName": "FLUSH_BAD_BRANCH",
> +        "PublicDescription": "Flushes due to bad predicted branch. This =
only includes CT flushes."
> +    },
> +    {
> +        "EventCode": "0x0123",
> +        "EventName": "FLUSH_STDBYPASS",
> +        "PublicDescription": "Flushes due to bad predecode. This only in=
cludes CT flushes."
> +    },
> +    {
> +        "EventCode": "0x0124",
> +        "EventName": "FLUSH_ISB",
> +        "PublicDescription": "Flushes due to ISB or similar side-effects=
=2E This only includes CT flushes."
> +    },
> +    {
> +        "EventCode": "0x0125",
> +        "EventName": "FLUSH_OTHER",
> +        "PublicDescription": "Flushes due to other hazards. This only in=
cludes CT flushes."
> +    },
> +    {
> +        "EventCode": "0x0126",
> +        "EventName": "STORE_STREAM",
> +        "PublicDescription": "Stored lines in streaming no-Write-allocat=
e mode."
> +    },
> +    {
> +        "EventCode": "0x0127",
> +        "EventName": "NUKE_RAR",
> +        "PublicDescription": "Load/Store nuke due to Read-after-Read ord=
ering hazard."
> +    },
> +    {
> +        "EventCode": "0x0128",
> +        "EventName": "NUKE_RAW",
> +        "PublicDescription": "Load/Store nuke due to Read-after-Write or=
dering hazard."
> +    },
> +    {
> +        "EventCode": "0x0129",
> +        "EventName": "L1_PF_GEN_PAGE",
> +        "PublicDescription": "Load/Store prefetch to L1 generated, Page =
mode."
> +    },
> +    {
> +        "EventCode": "0x012a",
> +        "EventName": "L1_PF_GEN_STRIDE",
> +        "PublicDescription": "Load/Store prefetch to L1 generated, strid=
e mode."
> +    },
> +    {
> +        "EventCode": "0x012b",
> +        "EventName": "L2_PF_GEN_LD",
> +        "PublicDescription": "Load prefetch to L2 generated."
> +    },
> +    {
> +        "EventCode": "0x012d",
> +        "EventName": "LS_PF_TRAIN_TABLE_ALLOC",
> +        "PublicDescription": "LS prefetch train table entry allocated."
> +    },
> +    {
> +        "EventCode": "0x0130",
> +        "EventName": "LS_PF_GEN_TABLE_ALLOC",
> +        "PublicDescription": "This event counts the number of cycles wit=
h at least one table allocation, for L2 hardware prefetches (including the =
software PRFM instructions that are converted into hardware prefetches due =
to D-TLB miss).\nLS prefetch gen table allocation (for L2 prefetches)."
> +    },
> +    {
> +        "EventCode": "0x0131",
> +        "EventName": "LS_PF_GEN_TABLE_ALLOC_PF_PEND",
> +        "PublicDescription": "This event counts the number of cycles in =
which at least one hardware prefetch is dropped due to the inability to ide=
ntify a victim when the generation table is full. The hardware prefetch con=
sidered here includes the software PRFM that is converted into hardware pre=
fetches due to D-TLB miss."
> +    },
> +    {
> +        "EventCode": "0x0132",
> +        "EventName": "TBW",
> +        "PublicDescription": "Tablewalks."
> +    },
> +    {
> +        "EventCode": "0x0134",
> +        "EventName": "S1L2_HIT",
> +        "PublicDescription": "Translation cache hit on S1L2 walk cache e=
ntry."
> +    },
> +    {
> +        "EventCode": "0x0135",
> +        "EventName": "S1L1_HIT",
> +        "PublicDescription": "Translation cache hit on S1L1 walk cache e=
ntry."
> +    },
> +    {
> +        "EventCode": "0x0136",
> +        "EventName": "S1L0_HIT",
> +        "PublicDescription": "Translation cache hit on S1L0 walk cache e=
ntry."
> +    },
> +    {
> +        "EventCode": "0x0137",
> +        "EventName": "S2L2_HIT",
> +        "PublicDescription": "Translation cache hit for S2L2 IPA walk ca=
che entry."
> +    },
> +    {
> +        "EventCode": "0x0138",
> +        "EventName": "IPA_REQ",
> +        "PublicDescription": "Translation cache lookups for IPA to PA en=
tries."
> +    },
> +    {
> +        "EventCode": "0x0139",
> +        "EventName": "IPA_REFILL",
> +        "PublicDescription": "Translation cache refills for IPA to PA en=
tries."
> +    },
> +    {
> +        "EventCode": "0x013a",
> +        "EventName": "S1_FLT",
> +        "PublicDescription": "Stage1 tablewalk fault."
> +    },
> +    {
> +        "EventCode": "0x013b",
> +        "EventName": "S2_FLT",
> +        "PublicDescription": "Stage2 tablewalk fault."
> +    },
> +    {
> +        "EventCode": "0x013c",
> +        "EventName": "COLT_REFILL",
> +        "PublicDescription": "Aggregated page refill."
> +    },
> +    {
> +        "EventCode": "0x0145",
> +        "EventName": "L1_PF_HIT",
> +        "PublicDescription": "L1 prefetch requests, hitting in L1 cache."
> +    },
> +    {
> +        "EventCode": "0x0146",
> +        "EventName": "L1_PF",
> +        "PublicDescription": "L1 prefetch requests."
> +    },
> +    {
> +        "EventCode": "0x0147",
> +        "EventName": "CACHE_LS_REFILL",
> +        "PublicDescription": "L2 D-cache refill, Load/Store."
> +    },
> +    {
> +        "EventCode": "0x0148",
> +        "EventName": "CACHE_PF",
> +        "PublicDescription": "L2 prefetch requests."
> +    },
> +    {
> +        "EventCode": "0x0149",
> +        "EventName": "CACHE_PF_HIT",
> +        "PublicDescription": "L2 prefetch requests, hitting in L2 cache."
> +    },
> +    {
> +        "EventCode": "0x0150",
> +        "EventName": "UNUSED_PF",
> +        "PublicDescription": "L2 unused prefetch."
> +    },
> +    {
> +        "EventCode": "0x0151",
> +        "EventName": "PFT_SENT",
> +        "PublicDescription": "L2 prefetch TGT sent.\nNote that PFT_SENT =
!=3D PFT_USEFUL + PFT_DROP. There may be PFT_SENT for which the accesses re=
sulted in a SLC hit."
> +    },
> +    {
> +        "EventCode": "0x0152",
> +        "EventName": "PFT_USEFUL",
> +        "PublicDescription": "L2 prefetch TGT useful."
> +    },
> +    {
> +        "EventCode": "0x0153",
> +        "EventName": "PFT_DROP",
> +        "PublicDescription": "L2 prefetch TGT dropped."
> +    },
> +    {
> +        "EventCode": "0x0162",
> +        "EventName": "LRQ_FULL",
> +        "PublicDescription": "This event counts the number of cycles the=
 LRQ is full."
> +    },
> +    {
> +        "EventCode": "0x0163",
> +        "EventName": "FETCH_FQ_EMPTY",
> +        "PublicDescription": "Fetch Queue empty cycles."
> +    },
> +    {
> +        "EventCode": "0x0164",
> +        "EventName": "FPG2",
> +        "PublicDescription": "Forward progress guarantee. Medium range l=
ivelock triggered."
> +    },
> +    {
> +        "EventCode": "0x0165",
> +        "EventName": "FPG",
> +        "PublicDescription": "Forward progress guarantee. Tofu global li=
velock buster is triggered."
> +    },
> +    {
> +        "EventCode": "0x0172",
> +        "EventName": "DEADBLOCK",
> +        "PublicDescription": "Write-back evictions converted to dataless=
 EVICT.\nThe victim line is deemed deadblock if the likeliness of a reuse i=
s low. The Core uses dataless evict to evict a deadblock; and it uses an ev=
ict with data to evict an L2 line that is not a deadblock."
> +    },
> +    {
> +        "EventCode": "0x0173",
> +        "EventName": "PF_PRQ_ALLOC_PF_PEND",
> +        "PublicDescription": "L1 prefetch prq allocation (replacing pend=
ing)."
> +    },
> +    {
> +        "EventCode": "0x0178",
> +        "EventName": "FETCH_ICACHE_INSTR",
> +        "PublicDescription": "Instructions fetched from I-cache."
> +    },
> +    {
> +        "EventCode": "0x017b",
> +        "EventName": "NEAR_CAS",
> +        "PublicDescription": "Near atomics: compare and swap."
> +    },
> +    {
> +        "EventCode": "0x017c",
> +        "EventName": "NEAR_CAS_PASS",
> +        "PublicDescription": "Near atomics: compare and swap pass."
> +    },
> +    {
> +        "EventCode": "0x017d",
> +        "EventName": "FAR_CAS",
> +        "PublicDescription": "Far atomics: compare and swap."
> +    },
> +    {
> +        "EventCode": "0x0186",
> +        "EventName": "L2_BTB_RELOAD_MAIN_BTB",
> +        "PublicDescription": "Number of completed L1 BTB update initiate=
d by L2 BTB hit which swap branch information between L1 BTB and L2 BTB."
> +    },
> +    {
> +        "EventCode": "0x018f",
> +        "EventName": "L1_PF_GEN_MCMC",
> +        "PublicDescription": "Load/Store prefetch to L1 generated, MCMC."
> +    },
> +    {
> +        "EventCode": "0x0190",
> +        "EventName": "PF_MODE_0_CYCLES",
> +        "PublicDescription": "Number of cycles in which the hardware pre=
fetcher is in the most aggressive mode."
> +    },
> +    {
> +        "EventCode": "0x0191",
> +        "EventName": "PF_MODE_1_CYCLES",
> +        "PublicDescription": "Number of cycles in which the hardware pre=
fetcher is in the more aggressive mode."
> +    },
> +    {
> +        "EventCode": "0x0192",
> +        "EventName": "PF_MODE_2_CYCLES",
> +        "PublicDescription": "Number of cycles in which the hardware pre=
fetcher is in the less aggressive mode."
> +    },
> +    {
> +        "EventCode": "0x0193",
> +        "EventName": "PF_MODE_3_CYCLES",
> +        "PublicDescription": "Number of cycles in which the hardware pre=
fetcher is in the most conservative mode."
> +    },
> +    {
> +        "EventCode": "0x0194",
> +        "EventName": "TXREQ_LIMIT_MAX_CYCLES",
> +        "PublicDescription": "Number of cycles in which the dynamic TXRE=
Q limit is the L2_TQ_SIZE."
> +    },
> +    {
> +        "EventCode": "0x0195",
> +        "EventName": "TXREQ_LIMIT_3QUARTER_CYCLES",
> +        "PublicDescription": "Number of cycles in which the dynamic TXRE=
Q limit is between 3/4 of the L2_TQ_SIZE and the L2_TQ_SIZE-1."
> +    },
> +    {
> +        "EventCode": "0x0196",
> +        "EventName": "TXREQ_LIMIT_HALF_CYCLES",
> +        "PublicDescription": "Number of cycles in which the dynamic TXRE=
Q limit is between 1/2 of the L2_TQ_SIZE and 3/4 of the L2_TQ_SIZE."
> +    },
> +    {
> +        "EventCode": "0x0197",
> +        "EventName": "TXREQ_LIMIT_1QUARTER_CYCLES",
> +        "PublicDescription": "Number of cycles in which the dynamic TXRE=
Q limit is between 1/4 of the L2_TQ_SIZE and 1/2 of the L2_TQ_SIZE."
> +    },
> +    {
> +        "EventCode": "0x019d",
> +        "EventName": "PREFETCH_LATE_CMC",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by CMC prefetch request."
> +    },
> +    {
> +        "EventCode": "0x019e",
> +        "EventName": "PREFETCH_LATE_BO",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by BO prefetch request."
> +    },
> +    {
> +        "EventCode": "0x019f",
> +        "EventName": "PREFETCH_LATE_STRIDE",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by STRIDE prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01a0",
> +        "EventName": "PREFETCH_LATE_SPATIAL",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by SPATIAL prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01a2",
> +        "EventName": "PREFETCH_LATE_TBW",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by TBW prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01a3",
> +        "EventName": "PREFETCH_LATE_PAGE",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by PAGE prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01a4",
> +        "EventName": "PREFETCH_LATE_GSMS",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by GSMS prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01a5",
> +        "EventName": "PREFETCH_LATE_SIP_CONS",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit o=
n TQ entry allocated by SIP_CONS prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01a6",
> +        "EventName": "PREFETCH_REFILL_CMC",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from C=
MC pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01a7",
> +        "EventName": "PREFETCH_REFILL_BO",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from B=
O pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01a8",
> +        "EventName": "PREFETCH_REFILL_STRIDE",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from S=
TRIDE pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01a9",
> +        "EventName": "PREFETCH_REFILL_SPATIAL",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from S=
PATIAL pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01ab",
> +        "EventName": "PREFETCH_REFILL_TBW",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from T=
BW pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01ac",
> +        "EventName": "PREFETCH_REFILL_PAGE",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from P=
AGE pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01ad",
> +        "EventName": "PREFETCH_REFILL_GSMS",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from G=
SMS pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01ae",
> +        "EventName": "PREFETCH_REFILL_SIP_CONS",
> +        "PublicDescription": "PF/prefetch or PF/readclean request from S=
IP_CONS pf engine filled the L2 cache."
> +    },
> +    {
> +        "EventCode": "0x01af",
> +        "EventName": "CACHE_HIT_LINE_PF_CMC",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by CMC prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b0",
> +        "EventName": "CACHE_HIT_LINE_PF_BO",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by BO prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b1",
> +        "EventName": "CACHE_HIT_LINE_PF_STRIDE",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by STRIDE prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b2",
> +        "EventName": "CACHE_HIT_LINE_PF_SPATIAL",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by SPATIAL prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b4",
> +        "EventName": "CACHE_HIT_LINE_PF_TBW",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by TBW prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b5",
> +        "EventName": "CACHE_HIT_LINE_PF_PAGE",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by PAGE prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b6",
> +        "EventName": "CACHE_HIT_LINE_PF_GSMS",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by GSMS prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01b7",
> +        "EventName": "CACHE_HIT_LINE_PF_SIP_CONS",
> +        "PublicDescription": "LS/readclean or LS/readunique lookup hit i=
n L2 cache on line filled by SIP_CONS prefetch request."
> +    },
> +    {
> +        "EventCode": "0x01ba",
> +        "EventName": "PREFETCH_LATE_STORE_ISSUE",
> +        "PublicDescription": "This event counts the number of demand req=
uests that matches a Store-issue prefetcher's pending refill request. These=
 are called late prefetch requests and are still counted as useful prefetch=
er requests for the sake of accuracy and coverage measurements."
> +    },
> +    {
> +        "EventCode": "0x01bb",
> +        "EventName": "PREFETCH_LATE_STORE_STRIDE",
> +        "PublicDescription": "This event counts the number of demand req=
uests that matches a Store-stride prefetcher's pending refill request. Thes=
e are called late prefetch requests and are still counted as useful prefetc=
her requests for the sake of accuracy and coverage measurements."
> +    },
> +    {
> +        "EventCode": "0x01bc",
> +        "EventName": "PREFETCH_LATE_PC_OFFSET",
> +        "PublicDescription": "This event counts the number of demand req=
uests that matches a PC-offset prefetcher's pending refill request. These a=
re called late prefetch requests and are still counted as useful prefetcher=
 requests for the sake of accuracy and coverage measurements."
> +    },
> +    {
> +        "EventCode": "0x01bd",
> +        "EventName": "PREFETCH_LATE_IFUPF",
> +        "PublicDescription": "This event counts the number of demand req=
uests that matches a IFU prefetcher's pending refill request. These are cal=
led late prefetch requests and are still counted as useful prefetcher reque=
sts for the sake of accuracy and coverage measurements."
> +    },
> +    {
> +        "EventCode": "0x01be",
> +        "EventName": "PREFETCH_REFILL_STORE_ISSUE",
> +        "PublicDescription": "This event counts the number of cache refi=
lls due to Store-Issue prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01bf",
> +        "EventName": "PREFETCH_REFILL_STORE_STRIDE",
> +        "PublicDescription": "This event counts the number of cache refi=
lls due to Store-stride prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c0",
> +        "EventName": "PREFETCH_REFILL_PC_OFFSET",
> +        "PublicDescription": "This event counts the number of cache refi=
lls due to PC-offset prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c1",
> +        "EventName": "PREFETCH_REFILL_IFUPF",
> +        "PublicDescription": "This event counts the number of cache refi=
lls due to IFU prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c2",
> +        "EventName": "CACHE_HIT_LINE_PF_STORE_ISSUE",
> +        "PublicDescription": "This event counts the number of first hit =
to a cache line filled by Store-issue prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c3",
> +        "EventName": "CACHE_HIT_LINE_PF_STORE_STRIDE",
> +        "PublicDescription": "This event counts the number of first hit =
to a cache line filled by Store-stride prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c4",
> +        "EventName": "CACHE_HIT_LINE_PF_PC_OFFSET",
> +        "PublicDescription": "This event counts the number of first hit =
to a cache line filled by PC-offset prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c5",
> +        "EventName": "CACHE_HIT_LINE_PF_IFUPF",
> +        "PublicDescription": "This event counts the number of first hit =
to a cache line filled by IFU prefetcher."
> +    },
> +    {
> +        "EventCode": "0x01c6",
> +        "EventName": "L2_PF_GEN_ST_ISSUE",
> +        "PublicDescription": "Store-issue prefetch to L2 generated."
> +    },
> +    {
> +        "EventCode": "0x01c7",
> +        "EventName": "L2_PF_GEN_ST_STRIDE",
> +        "PublicDescription": "Store-stride prefetch to L2 generated"
> +    },
> +    {
> +        "EventCode": "0x01cb",
> +        "EventName": "L2_TQ_OUTSTANDING",
> +        "PublicDescription": "Outstanding tracker count, per cycle.\nThi=
s event increments by the number of valid entries pertaining to this thread=
 in the L2TQ, in each cycle.\nThis event can be used to calculate the occup=
ancy of L2TQ by dividing this by the CPU_CYCLES event. The L2TQ queue track=
s the outstanding Read, Write and Snoop transactions. The Read transaction =
and the Write transaction entries are attributable to PE, whereas the Snoop=
 transactions are not always attributable to PE."
> +    },
> +    {
> +        "EventCode": "0x01cc",
> +        "EventName": "TXREQ_LIMIT_COUNT_CYCLES",
> +        "PublicDescription": "This event increments by the dynamic TXREQ=
 value, in each cycle.\nThis is a companion event of TXREQ_LIMIT_MAX_CYCLES=
, TXREQ_LIMIT_3QUARTER_CYCLES, TXREQ_LIMIT_HALF_CYCLES, and TXREQ_LIMIT_1QU=
ARTER_CYCLES."
> +    },
> +    {
> +        "EventCode": "0x01ce",
> +        "EventName": "L3DPRFM_TO_L2PRQ_CONVERTED",
> +        "PublicDescription": "This event counts the number of Converted-=
L3D-PRFMs. These are indeed L3D PRFM and activities around these PRFM are c=
ounted by the L3D_CACHE_PRFM, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL Ev=
ents."
> +    },
> +    {
> +        "EventCode": "0x01d2",
> +        "EventName": "DVM_TLBI_RCVD",
> +        "PublicDescription": "This event counts the number of TLBI DVM m=
essage received over CHI interface, for *this* Core."
> +    },
> +    {
> +        "EventCode": "0x01d6",
> +        "EventName": "DSB_COMMITING_LOCAL_TLBI",
> +        "PublicDescription": "This event counts the number of DSB that a=
re retired and committed at least one local TLBI instruction. This event in=
crements no more than once (in a cycle) even if the DSB commits multiple lo=
cal TLBI instruction."
> +    },
> +    {
> +        "EventCode": "0x01d7",
> +        "EventName": "DSB_COMMITING_BROADCAST_TLBI",
> +        "PublicDescription": "This event counts the number of DSB that a=
re retired and committed at least one broadcast TLBI instruction. This even=
t increments no more than once (in a cycle) even if the DSB commits multipl=
e broadcast TLBI instruction."
> +    },
> +    {
> +        "EventCode": "0x01eb",
> +        "EventName": "L1DPRFM_L2DPRFM_TO_L2PRQ_CONVERTED",
> +        "PublicDescription": "This event counts the number of Converted-=
L1D-PRFMs and Converted-L2D-PRFM.\nActivities involving the Converted-L1D-P=
RFM are counted by the L1D_CACHE_PRFM. However they are *not* counted by th=
e L1D_CACHE_REFILL_PRFM, and L1D_CACHE_REFILL, as these Converted-L1D-PRFM =
are treated as L2 D hardware prefetches. Activities around the Converted-L1=
D-PRFMs and Converted-L2D-PRFMs are counted by the L2D_CACHE_PRFM, L2D_CACH=
E_REFILL_PRFM and L2D_CACHE_REFILL Events."
> +    },
> +    {
> +        "EventCode": "0x01ec",
> +        "EventName": "PREFETCH_LATE_CONVERTED_PRFM",
> +        "PublicDescription": "This event counts the number of demand req=
uests that matches a Converted-L1D-PRFM or Converted-L2D-PRFM pending refil=
l request at L2 D-cache. These are called late prefetch requests and are st=
ill counted as useful prefetcher requests for the sake of accuracy and cove=
rage measurements.\nNote that this event is not counted by the L2D_CACHE_HI=
T_RWL1PRF_LATE_HWPRF, though the Converted-L1D-PRFM or Converted-L2D-PRFM a=
re replayed by the L2PRQ."
> +    },
> +    {
> +        "EventCode": "0x01ed",
> +        "EventName": "PREFETCH_REFILL_CONVERTED_PRFM",
> +        "PublicDescription": "This event counts the number of L2 D-cache=
 refills due to Converted-L1D-PRFM or Converted-L2D-PRFM.\nNote : L2D_CACHE=
_REFILL_PRFM is inclusive of PREFETCH_REFILL_PRFM_CONVERTED, where both the=
 PREFETCH_REFILL_PRFM_CONVERTED and the L2D_CACHE_REFILL_PRFM increment whe=
n L2 D-cache refills due to Converted-L1D-PRFM or Converted-L2D-PRFM."
> +    },
> +    {
> +        "EventCode": "0x01ee",
> +        "EventName": "CACHE_HIT_LINE_PF_CONVERTED_PRFM",
> +        "PublicDescription": "This event counts the number of first hit =
to a cache line filled by Converted-L1D-PRFM or Converted-L2D-PRFM.\nNote t=
hat L2D_CACHE_HIT_RWL1PRF_FPRFM is inclusive of CACHE_HIT_LINE_PF_CONVERTED=
_PRFM, where both the CACHE_HIT_LINE_PF_CONVERTED_PRFM and the L2D_CACHE_HI=
T_RWL1PRF_FPRFM increment on a first hit to L2 D-cache filled by Converted-=
L1D-PRFM or Converted-L2D-PRFM."
> +    },
> +    {
> +        "EventCode": "0x01f0",
> +        "EventName": "TMS_ST_TO_SMT_LATENCY",
> +        "PublicDescription": "This event counts the number of CPU cycles=
 spent on TMS for ST-to-SMT switch.\nThis event is counted by both the thre=
ads - This event in both threads increment during TMS for ST-to-SMT switch."
> +    },
> +    {
> +        "EventCode": "0x01f1",
> +        "EventName": "TMS_SMT_TO_ST_LATENCY",
> +        "PublicDescription": "This event counts the number of CPU cycles=
 spent on TMS for SMT-to-ST switch. The count also includes the CPU cycles =
spend due to an aborted SMT-to-ST TMS attempt.\nThis event is counted only =
by the thread that is not in WFI."
> +    },
> +    {
> +        "EventCode": "0x01f2",
> +        "EventName": "TMS_ST_TO_SMT_COUNT",
> +        "PublicDescription": "This event counts the number of completed =
TMS from ST-to-SMT.\nThis event is counted only by the active thread (the o=
ne that is not in WFI).\nNote: When an active thread enters the Debug state=
 in ST-Full resource mode, it is switched to SMT mode. This is because the =
inactive thread cannot wake up while the other thread remains in the Debug =
state. To prEvent this issue, threads operating in ST-Full resource mode ar=
e transitioned to SMT mode upon entering Debug state. This event count will=
 also reflect such switches from ST to SMT mode.\n(Also see the (NV_CPUACTL=
R14_EL1.chka_prEvent_st_tx_to_smt_when_tx_in_debug_state bit to disable thi=
s behavior.)"
> +    },
> +    {
> +        "EventCode": "0x01f3",
> +        "EventName": "TMS_SMT_TO_ST_COUNT",
> +        "PublicDescription": "This event counts the number of completed =
TMS from SMT-to-ST.\nThis event is counted only by the thread that is not i=
n WFI."
> +    },
> +    {
> +        "EventCode": "0x01f4",
> +        "EventName": "TMS_SMT_TO_ST_COUNT_ABRT",
> +        "PublicDescription": "This event counts the number of aborted TM=
S from SMT-to-ST.\nThis event is counted only by the thread that is not in =
WFI."
> +    },
> +    {
> +        "EventCode": "0x0202",
> +        "EventName": "L0I_CACHE_RD",
> +        "PublicDescription": "This event counts the number of predict bl=
ocks serviced out of L0 I-cache.\nNote: The L0 I-cache performs at most 4 L=
0 I look-up in a cycle. Two of which are to service PB from L0 I. And the o=
ther two to refill L0 I-cache from L1 I. This event count only the L0 I-cac=
he lookup pertaining to servicing the PB from L0 I."
> +    },
> +    {
> +        "EventCode": "0x0203",
> +        "EventName": "L0I_CACHE_REFILL",
> +        "PublicDescription": "This event counts the number of L0I cache =
refill from L1 I-cache."
> +    },
> +    {
> +        "EventCode": "0x0207",
> +        "EventName": "INTR_LATENCY",
> +        "PublicDescription": "This event counts the number of cycles ela=
psed between when an Interrupt is recognized (after masking) to when a uop =
associated with the first instruction in the destination exception level is=
 allocated. If there is some other flush condition that pre-empts the Inter=
rupt, then the cycles counted terminates early at the first instruction exe=
cuted after that flush. In the event of dropped Interrupts (when an Interru=
pt is deasserted before it is taken), this counter measures the number of c=
ycles that elapse from the moment an Interrupt is recognized (post-masking)=
 until the Interrupt is dropped or deasserted.\nNote that\n* IESB(Implicit =
Error Synchronization Barrier) is an internal mop, so the latency of an imp=
licit IESB mop executed before the Interrupt taken is included in the Inter=
rupt latency count.\n* Nukes or TMS sequence within the window are also cou=
nted by the Interrupt latency Event.\n* A SMT to ST TMS will be aborted on =
detecting the wake condition for the WFI thread. The Interrupt latency coun=
t includes any additional penalty for an aborted TMS."
> +    },
> +    {
> +        "EventCode": "0x021c",
> +        "EventName": "CWT_ALLOC_ENTRY",
> +        "PublicDescription": "Cache Way Tracker Allocate entry."
> +    },
> +    {
> +        "EventCode": "0x021d",
> +        "EventName": "CWT_ALLOC_LINE",
> +        "PublicDescription": "Cache Way Tracker Allocate line."
> +    },
> +    {
> +        "EventCode": "0x021e",
> +        "EventName": "CWT_HIT",
> +        "PublicDescription": "Cache Way Tracker hit."
> +    },
> +    {
> +        "EventCode": "0x021f",
> +        "EventName": "CWT_HIT_TAG",
> +        "PublicDescription": "Cache Way Tracker hit when ITAG lookup sup=
pressed."
> +    },
> +    {
> +        "EventCode": "0x0220",
> +        "EventName": "CWT_REPLAY_TAG",
> +        "PublicDescription": "Cache Way Tracker causes ITAG replay due t=
o miss when ITAG lookup suppressed."
> +    },
> +    {
> +        "EventCode": "0x0250",
> +        "EventName": "GPT_REQ",
> +        "PublicDescription": "GPT lookup."
> +    },
> +    {
> +        "EventCode": "0x0251",
> +        "EventName": "GPT_WC_HIT",
> +        "PublicDescription": "GPT lookup hit in Walk cache."
> +    },
> +    {
> +        "EventCode": "0x0252",
> +        "EventName": "GPT_PG_HIT",
> +        "PublicDescription": "GPT lookup hit in TLB."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json b/=
tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json
> new file mode 100644
> index 000000000000..34c7eefa66b0
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json
> @@ -0,0 +1,94 @@
> +[
> +    {
> +        "ArchStdEvent": "INST_RETIRED",
> +        "PublicDescription": "This event counts instructions that have b=
een architecturally executed."
> +    },
> +    {
> +        "ArchStdEvent": "CID_WRITE_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 writes to the CONTEXTIDR_EL1 register, which usually contains the kernel P=
ID and can be output with hardware trace."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IMMED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 direct branches."
> +    },
> +    {
> +        "ArchStdEvent": "BR_RETURN_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 procedure returns."
> +    },
> +    {
> +        "ArchStdEvent": "TTBR_WRITE_RETIRED",
> +        "PublicDescription": "This event counts architectural writes to =
TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the =
HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to =
TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are =
typically updated when the kernel is swapping user-space threads or applica=
tions."
> +    },
> +    {
> +        "ArchStdEvent": "BR_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 branches, whether the branch is taken or not. Instructions that explicitly=
 write to the PC are also counted. Note that exception generating instructi=
ons, exception return instructions, and context synchronization instruction=
s are not counted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts branches counted by BR_R=
ETIRED which were mispredicted and caused a pipeline flush."
> +    },
> +    {
> +        "ArchStdEvent": "OP_RETIRED",
> +        "PublicDescription": "This event counts micro-operations that ar=
e architecturally executed. This is a count of number of micro-operations r=
etired from the commit queue in a single cycle."
> +    },
> +    {
> +        "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 indirect branches excluding procedure returns that were taken."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IMMED_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 direct branches that were correctly predicted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 direct branches that were mispredicted and caused a pipeline flush."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IND_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 indirect branches including procedure returns that were correctly predicte=
d."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 indirect branches including procedure returns that were mispredicted and c=
aused a pipeline flush."
> +    },
> +    {
> +        "ArchStdEvent": "BR_RETURN_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 procedure returns that were correctly predicted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 procedure returns that were mispredicted and caused a pipeline flush."
> +    },
> +    {
> +        "ArchStdEvent": "BR_INDNR_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 indirect branches excluding procedure returns that were correctly predicte=
d."
> +    },
> +    {
> +        "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 indirect branches excluding procedure returns that were mispredicted and c=
aused a pipeline flush."
> +    },
> +    {
> +        "ArchStdEvent": "BR_TAKEN_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 branches that were taken and were correctly predicted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_TAKEN_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 branches that were taken and were mispredicted causing a pipeline flush."
> +    },
> +    {
> +        "ArchStdEvent": "BR_SKIP_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 branches that were not taken and were correctly predicted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_SKIP_MIS_PRED_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 branches that were not taken and were mispredicted causing a pipeline flus=
h."
> +    },
> +    {
> +        "ArchStdEvent": "BR_PRED_RETIRED",
> +        "PublicDescription": "This event counts branch instructions coun=
ted by BR_RETIRED which were correctly predicted."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IND_RETIRED",
> +        "PublicDescription": "This event counts architecturally executed=
 indirect branches including procedure returns."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json b/tool=
s/perf/pmu-events/arch/arm64/nvidia/t410/spe.json
> new file mode 100644
> index 000000000000..00d0c5051a48
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json
> @@ -0,0 +1,42 @@
> +[
> +    {
> +        "ArchStdEvent": "SAMPLE_POP",
> +        "PublicDescription": "This event counts statistical profiling sa=
mple population, the count of all operations that could be sampled but may =
or may not be chosen for sampling."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken for sampling."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FILTRATE",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are not removed by filtering."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_COLLISION",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples that have collided with a previous sample and so therefore not taken."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED_BR",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are branches."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED_LD",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are Loads or Load atomic operations."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED_ST",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are Stores or Store atomic operations."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED_OP",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are matching any operation type filters supported."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED_EVENT",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are matching event packet filter constraints."
> +    },
> +    {
> +        "ArchStdEvent": "SAMPLE_FEED_LAT",
> +        "PublicDescription": "This event counts statistical profiling sa=
mples taken which are exceeding minimum latency set by operation latency fi=
lter constraints."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.=
json b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json
> new file mode 100644
> index 000000000000..8bc802f5f350
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json
> @@ -0,0 +1,230 @@
> +[
> +    {
> +        "ArchStdEvent": "INST_SPEC",
> +        "PublicDescription": "This event counts operations that have bee=
n speculatively executed."
> +    },
> +    {
> +        "ArchStdEvent": "OP_SPEC",
> +        "PublicDescription": "This event counts micro-operations specula=
tively executed. This is the count of the number of micro-operations dispat=
ched in a cycle."
> +    },
> +    {
> +        "ArchStdEvent": "UNALIGNED_LD_SPEC",
> +        "PublicDescription": "This event counts unaligned memory Read op=
erations issued by the CPU. This event counts unaligned accesses (as define=
d by the actual instruction), even if they are subsequently issued as multi=
ple aligned accesses.\nThis event does not count preload operations (PLD, P=
LI).\nThis event is a subset of the UNALIGNED_LDST_SPEC event."
> +    },
> +    {
> +        "ArchStdEvent": "UNALIGNED_ST_SPEC",
> +        "PublicDescription": "This event counts unaligned memory Write o=
perations issued by the CPU. This event counts unaligned accesses (as defin=
ed by the actual instruction), even if they are subsequently issued as mult=
iple aligned accesses.\nThis event is a subset of the UNALIGNED_LDST_SPEC e=
vent."
> +    },
> +    {
> +        "ArchStdEvent": "UNALIGNED_LDST_SPEC",
> +        "PublicDescription": "This event counts unaligned memory operati=
ons issued by the CPU. This event counts unaligned accesses (as defined by =
the actual instruction), even if they are subsequently issued as multiple a=
ligned accesses.\nThis event is the sum of the following events:\nUNALIGNED=
_ST_SPEC and\nUNALIGNED_LD_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "LDREX_SPEC",
> +        "PublicDescription": "This event counts Load-Exclusive operation=
s that have been speculatively executed. For example: LDREX, LDX"
> +    },
> +    {
> +        "ArchStdEvent": "STREX_PASS_SPEC",
> +        "PublicDescription": "This event counts Store-exclusive operatio=
ns that have been speculatively executed and have successfully completed th=
e Store operation."
> +    },
> +    {
> +        "ArchStdEvent": "STREX_FAIL_SPEC",
> +        "PublicDescription": "This event counts Store-exclusive operatio=
ns that have been speculatively executed and have not successfully complete=
d the Store operation."
> +    },
> +    {
> +        "ArchStdEvent": "STREX_SPEC",
> +        "PublicDescription": "This event counts Store-exclusive operatio=
ns that have been speculatively executed.\nThis event is the sum of the fol=
lowing events:\nSTREX_PASS_SPEC and\nSTREX_FAIL_SPEC."
> +    },
> +    {
> +        "ArchStdEvent": "LD_SPEC",
> +        "PublicDescription": "This event counts speculatively executed L=
oad operations including Single Instruction Multiple Data (SIMD) Load opera=
tions."
> +    },
> +    {
> +        "ArchStdEvent": "ST_SPEC",
> +        "PublicDescription": "This event counts speculatively executed S=
tore operations including Single Instruction Multiple Data (SIMD) Store ope=
rations."
> +    },
> +    {
> +        "ArchStdEvent": "LDST_SPEC",
> +        "PublicDescription": "This event counts Load and Store operation=
s that have been speculatively executed."
> +    },
> +    {
> +        "ArchStdEvent": "DP_SPEC",
> +        "PublicDescription": "This event counts speculatively executed l=
ogical or arithmetic instructions such as MOV/MVN operations."
> +    },
> +    {
> +        "ArchStdEvent": "ASE_SPEC",
> +        "PublicDescription": "This event counts speculatively executed A=
dvanced SIMD operations excluding Load, Store, and Move micro-operations th=
at move data to or from SIMD (vector) registers."
> +    },
> +    {
> +        "ArchStdEvent": "VFP_SPEC",
> +        "PublicDescription": "This event counts speculatively executed f=
loating point operations. This event does not count operations that move da=
ta to or from floating point (vector) registers."
> +    },
> +    {
> +        "ArchStdEvent": "PC_WRITE_SPEC",
> +        "PublicDescription": "This event counts speculatively executed o=
perations which cause software changes of the PC. Those operations include =
all taken branch operations."
> +    },
> +    {
> +        "ArchStdEvent": "CRYPTO_SPEC",
> +        "PublicDescription": "This event counts speculatively executed c=
ryptographic operations except for PMULL and VMULL operations."
> +    },
> +    {
> +        "ArchStdEvent": "BR_IMMED_SPEC",
> +        "PublicDescription": "This event counts direct branch operations=
 which are speculatively executed."
> +    },
> +    {
> +        "ArchStdEvent": "BR_RETURN_SPEC",
> +        "PublicDescription": "This event counts procedure return operati=
ons (RET, RETAA and RETAB) which are speculatively executed."
> +    },
> +    {
> +        "ArchStdEvent": "BR_INDIRECT_SPEC",
> +        "PublicDescription": "This event counts indirect branch operatio=
ns including procedure returns, which are speculatively executed. This incl=
udes operations that force a software change of the PC, other than exceptio=
n-generating operations and direct branch instructions. Some examples of th=
e instructions counted by this event include BR Xn, RET, etc."
> +    },
> +    {
> +        "ArchStdEvent": "ISB_SPEC",
> +        "PublicDescription": "This event counts ISB operations that are =
executed."
> +    },
> +    {
> +        "ArchStdEvent": "DSB_SPEC",
> +        "PublicDescription": "This event counts DSB operations that are =
speculatively issued to Load/Store unit in the CPU."
> +    },
> +    {
> +        "ArchStdEvent": "DMB_SPEC",
> +        "PublicDescription": "This event counts DMB operations that are =
speculatively issued to the Load/Store unit in the CPU. This event does not=
 count implied barriers from Load-acquire/Store-release operations."
> +    },
> +    {
> +        "ArchStdEvent": "CSDB_SPEC",
> +        "PublicDescription": "This event counts CSDB operations that are=
 speculatively issued to the Load/Store unit in the CPU. This event does no=
t count implied barriers from Load-acquire/Store-release operations."
> +    },
> +    {
> +        "ArchStdEvent": "RC_LD_SPEC",
> +        "PublicDescription": "This event counts any Load acquire operati=
ons that are speculatively executed. For example: LDAR, LDARH, LDARB"
> +    },
> +    {
> +        "ArchStdEvent": "RC_ST_SPEC",
> +        "PublicDescription": "This event counts any Store release operat=
ions that are speculatively executed. For example: STLR, STLRH, STLRB"
> +    },
> +    {
> +        "ArchStdEvent": "SIMD_INST_SPEC",
> +        "PublicDescription": "This event counts speculatively executed o=
perations that are SIMD or SVE vector operations or Advanced SIMD non-scala=
r operations."
> +    },
> +    {
> +        "ArchStdEvent": "ASE_INST_SPEC",
> +        "PublicDescription": "This event counts speculatively executed A=
dvanced SIMD operations."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_INST_SPEC",
> +        "PublicDescription": "This event counts speculatively executed o=
perations that are SVE operations."
> +    },
> +    {
> +        "ArchStdEvent": "INT_SPEC",
> +        "PublicDescription": "This event counts speculatively executed i=
nteger arithmetic operations."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_PRED_SPEC",
> +        "PublicDescription": "This event counts speculatively executed p=
redicated SVE operations.\nThis counter also counts SVE operation due to in=
struction with Governing predicate operand that determines the Active eleme=
nts that do not write to any SVE Z vector destination register using either=
 zeroing or merging predicate. Thus, the operations due to instructions suc=
h as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by t=
he SVE_PRED_* events."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_PRED_EMPTY_SPEC",
> +        "PublicDescription": "This event counts speculatively executed p=
redicated SVE operations with no active predicate elements.\nThis counter a=
lso counts SVE operation due to instruction with Governing predicate operan=
d that determines the Active elements that do not write to any SVE Z vector=
 destination register using either zeroing or merging predicate. Thus, the =
operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, =
SQDECP and PNEXT, are counted by the SVE_PRED_* events."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_PRED_FULL_SPEC",
> +        "PublicDescription": "This event counts speculatively executed p=
redicated SVE operations with all predicate elements active.\nThis counter =
also counts SVE operation due to instruction with Governing predicate opera=
nd that determines the Active elements that do not write to any SVE Z vecto=
r destination register using either zeroing or merging predicate. Thus, the=
 operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP,=
 SQDECP and PNEXT, are counted by the SVE_PRED_* events."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC",
> +        "PublicDescription": "This event counts speculatively executed p=
redicated SVE operations with at least one but not all active predicate ele=
ments.\nThis counter also counts SVE operation due to instruction with Gove=
rning predicate operand that determines the Active elements that do not wri=
te to any SVE Z vector destination register using either zeroing or merging=
 predicate. Thus, the operations due to instructions such as INCP, DECP, UQ=
INCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* event=
s."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC",
> +        "PublicDescription": "This event counts speculatively executed p=
redicated SVE operations with at least one non active predicate elements.\n=
This counter also counts SVE operation due to instruction with Governing pr=
edicate operand that determines the Active elements that do not write to an=
y SVE Z vector destination register using either zeroing or merging predica=
te. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQ=
DECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events."
> +    },
> +    {
> +        "ArchStdEvent": "PRF_SPEC",
> +        "PublicDescription": "This event counts speculatively executed o=
perations that prefetch memory. For example, Scalar: PRFM, SVE: PRFB, PRFD,=
 PRFH, or PRFW."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_LDFF_SPEC",
> +        "PublicDescription": "This event counts speculatively executed S=
VE first fault or non-fault Load operations."
> +    },
> +    {
> +        "ArchStdEvent": "SVE_LDFF_FAULT_SPEC",
> +        "PublicDescription": "This event counts speculatively executed S=
VE first fault or non-fault Load operations that clear at least one bit in =
the FFR."
> +    },
> +    {
> +        "ArchStdEvent": "ASE_SVE_INT8_SPEC",
> +        "PublicDescription": "This event counts speculatively executed A=
dvanced SIMD or SVE integer operations with the largest data type being an =
8-bit integer."
> +    },
> +    {
> +        "ArchStdEvent": "ASE_SVE_INT16_SPEC",
> +        "PublicDescription": "This event counts speculatively executed A=
dvanced SIMD or SVE integer operations with the largest data type a 16-bit =
integer."
> +    },
> +    {
> +        "ArchStdEvent": "ASE_SVE_INT32_SPEC",
> +        "PublicDescription": "This event counts speculatively executed A=
dvanced SIMD or SVE integer operations with the largest data type a 32-bit =
integer."
> +    },
> +    {
> +        "ArchStdEvent": "ASE_SVE_INT64_SPEC",
> +        "PublicDescription": "This event counts speculatively executed A=
dvanced SIMD or SVE integer operations with the largest data type a 64-bit =
integer."
> +    },
> +    {
> +        "EventCode": "0x011d",
> +        "EventName": "SPEC_RET_STACK_FULL",
> +        "PublicDescription": "This event counts predict pipe stalls due =
to speculative return address predictor full."
> +    },
> +    {
> +        "EventCode": "0x011f",
> +        "EventName": "MOPS_SPEC",
> +        "PublicDescription": "Macro-ops speculatively decoded."
> +    },
> +    {
> +        "EventCode": "0x0180",
> +        "EventName": "BR_SPEC_PRED_TAKEN",
> +        "PublicDescription": "Number of predicted taken from branch pred=
ictor."
> +    },
> +    {
> +        "EventCode": "0x0181",
> +        "EventName": "BR_SPEC_PRED_TAKEN_FROM_L2BTB",
> +        "PublicDescription": "Number of predicted taken branch from L2 B=
TB."
> +    },
> +    {
> +        "EventCode": "0x0182",
> +        "EventName": "BR_SPEC_PRED_TAKEN_MULTI",
> +        "PublicDescription": "Number of predicted taken for polymorphic =
branch."
> +    },
> +    {
> +        "EventCode": "0x0185",
> +        "EventName": "BR_SPEC_PRED_STATIC",
> +        "PublicDescription": "Number of post fetch prediction."
> +    },
> +    {
> +        "EventCode": "0x01d0",
> +        "EventName": "TLBI_LOCAL_SPEC",
> +        "PublicDescription": "A non-broadcast TLBI instruction executed =
(Speculatively or otherwise) on *this* PE."
> +    },
> +    {
> +        "EventCode": "0x01d1",
> +        "EventName": "TLBI_BROADCAST_SPEC",
> +        "PublicDescription": "A broadcast TLBI instruction executed (Spe=
culatively or otherwise) on *this* PE."
> +    },
> +    {
> +        "EventCode": "0x01e7",
> +        "EventName": "BR_SPEC_PRED_ALN_REDIR",
> +        "PublicDescription": "BPU predict pipe align redirect (either AL=
-APQ hit/miss)."
> +    },
> +    {
> +        "EventCode": "0x0200",
> +        "EventName": "SIMD_CRYPTO_INST_SPEC",
> +        "PublicDescription": "SIMD, SVE, and CRYPTO instructions specula=
tively decoded."
> +    },
> +    {
> +        "EventCode": "0x022e",
> +        "EventName": "VPRED_LD_SPEC",
> +        "PublicDescription": "This event counts the number of Speculativ=
ely-executed-Load operations with addresses produced by the value-predictio=
n mechanism. The loaded data might be discarded if the predicted address di=
ffers from the actual address."
> +    },
> +    {
> +        "EventCode": "0x022f",
> +        "EventName": "VPRED_LD_SPEC_MISMATCH",
> +        "PublicDescription": "This event counts a subset of VPRED_LD_SPE=
C where the predicted Load address and the actual address mismatched."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json b/to=
ols/perf/pmu-events/arch/arm64/nvidia/t410/stall.json
> new file mode 100644
> index 000000000000..92d9e0866c24
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json
> @@ -0,0 +1,145 @@
> +[
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND",
> +        "PublicDescription": "This event counts cycles when frontend cou=
ld not send any micro-operations to the rename stage because of frontend re=
source stalls caused by fetch memory latency or branch prediction flow stal=
ls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event coun=
ts. STALL_SLOT_FRONTEND will count SLOTS when this event is counted on this=
 CPU."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND",
> +        "PublicDescription": "This event counts cycles whenever the rena=
me unit is unable to send any micro-operations to the backend of the pipeli=
ne because of backend resource constraints. Backend resource constraints ca=
n include issue stage fullness, execution stage fullness, or other internal=
 pipeline resource fullness. All the backend slots were empty during the cy=
cle when this event counts."
> +    },
> +    {
> +        "ArchStdEvent": "STALL",
> +        "PublicDescription": "This event counts cycles when no operation=
s are sent to the rename unit from the frontend or from the rename unit to =
the backend for any reason (either frontend or backend stall). This event i=
s the sum of the following events:\nSTALL_FRONTEND and\nSTALL_BACKEND."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_SLOT_BACKEND",
> +        "PublicDescription": "This event counts slots per cycle in which=
 no operations are sent from the rename unit to the backend due to backend =
resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT=
_BACKEND counts at least 1. STALL_BACKEND counts during the cycle when STAL=
L_SLOT_BACKEND is SLOTS."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_SLOT_FRONTEND",
> +        "PublicDescription": "This event counts slots per cycle in which=
 no operations are sent to the rename unit from the frontend due to fronten=
d resource constraints. STALL_FRONTEND counts during the cycle when STALL_S=
LOT_FRONTEND is SLOTS."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_SLOT",
> +        "PublicDescription": "This event counts slots per cycle in which=
 no operations are sent to the rename unit from the frontend or from the re=
name unit to the backend for any reason (either frontend or backend stall).=
\nSTALL_SLOT is the sum of the following events:\nSTALL_SLOT_FRONTEND and\n=
STALL_SLOT_BACKEND."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_MEM",
> +        "PublicDescription": "This event counts cycles when the backend =
is stalled because there is a pending demand Load request in progress in th=
e last level Core cache.\nLast level cache in this CPU is Level 2, hence th=
is event counts same as STALL_BACKEND_L2D."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_MEMBOUND",
> +        "PublicDescription": "This event counts cycles when the frontend=
 could not send any micro-operations to the rename stage due to resource co=
nstraints in the memory resources."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_L1I",
> +        "PublicDescription": "This event counts cycles when the frontend=
 is stalled because there is an instruction fetch request pending in the L1=
 I-cache."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_MEM",
> +        "PublicDescription": "This event counts cycles when the frontend=
 is stalled because there is an instruction fetch request pending in the la=
st level Core cache.\nLast level cache in this CPU is Level 2, hence this e=
vent counts rather than STALL_FRONTEND_L2I."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_TLB",
> +        "PublicDescription": "This event counts when the frontend is sta=
lled on any TLB misses being handled. This event also counts the TLB access=
es made by hardware prefetches."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_CPUBOUND",
> +        "PublicDescription": "This event counts cycles when the frontend=
 could not send any micro-operations to the rename stage due to resource co=
nstraints in the CPU resources excluding memory resources."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_FLOW",
> +        "PublicDescription": "This event counts cycles when the frontend=
 could not send any micro-operations to the rename stage due to resource co=
nstraints in the branch prediction unit."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_FRONTEND_FLUSH",
> +        "PublicDescription": "This event counts cycles when the frontend=
 could not send any micro-operations to the rename stage as the frontend is=
 recovering from a machine flush or resteer. Example scenarios that cause a=
 flush include branch mispredictions, taken exceptions, microarchitectural =
flush etc."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_MEMBOUND",
> +        "PublicDescription": "This event counts cycles when the backend =
could not accept any micro-operations due to resource constraints in the me=
mory resources."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_L1D",
> +        "PublicDescription": "This event counts cycles when the backend =
is stalled because there is a pending demand Load request in progress in th=
e L1 D-cache."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_TLB",
> +        "PublicDescription": "This event counts cycles when the backend =
is stalled on any demand TLB misses being handled."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_ST",
> +        "PublicDescription": "This event counts cycles when the backend =
is stalled and there is a Store that has not reached the pre-commit stage."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_CPUBOUND",
> +        "PublicDescription": "This event counts cycles when the backend =
could not accept any micro-operations due to any resource constraints in th=
e CPU excluding memory resources."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_BUSY",
> +        "PublicDescription": "This event counts cycles when the backend =
could not accept any micro-operations because the issue queues are full to =
take any operations for execution."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_ILOCK",
> +        "PublicDescription": "This event counts cycles when the backend =
could not accept any micro-operations due to resource constraints imposed b=
y input dependency."
> +    },
> +    {
> +        "ArchStdEvent": "STALL_BACKEND_RENAME",
> +        "PublicDescription": "This event counts cycles when backend is s=
talled even when operations are available from the frontend but at least on=
e is not ready to be sent to the backend because no rename register is avai=
lable."
> +    },
> +    {
> +        "EventCode": "0x0158",
> +        "EventName": "FLAG_DISP_STALL",
> +        "PublicDescription": "Rename stalled due to FRF(Flag register fi=
le) full."
> +    },
> +    {
> +        "EventCode": "0x0159",
> +        "EventName": "GEN_DISP_STALL",
> +        "PublicDescription": "Rename stalled due to GRF (General-purpose=
 register file) full."
> +    },
> +    {
> +        "EventCode": "0x015a",
> +        "EventName": "VEC_DISP_STALL",
> +        "PublicDescription": "Rename stalled due to VRF (Vector register=
 file) full."
> +    },
> +    {
> +        "EventCode": "0x015c",
> +        "EventName": "SX_IQ_STALL",
> +        "PublicDescription": "Dispatch stalled due to IQ full, SX."
> +    },
> +    {
> +        "EventCode": "0x015d",
> +        "EventName": "MX_IQ_STALL",
> +        "PublicDescription": "Dispatch stalled due to IQ full, MX."
> +    },
> +    {
> +        "EventCode": "0x015e",
> +        "EventName": "LS_IQ_STALL",
> +        "PublicDescription": "Dispatch stalled due to IQ full, LS."
> +    },
> +    {
> +        "EventCode": "0x015f",
> +        "EventName": "VX_IQ_STALL",
> +        "PublicDescription": "Dispatch stalled due to IQ full, VX."
> +    },
> +    {
> +        "EventCode": "0x0160",
> +        "EventName": "MCQ_FULL_STALL",
> +        "PublicDescription": "Dispatch stalled due to MCQ full."
> +    },
> +    {
> +        "EventCode": "0x01cf",
> +        "EventName": "PRD_DISP_STALL",
> +        "PublicDescription": "Rename stalled due to predicate registers =
(physical) are full."
> +    },
> +    {
> +        "EventCode": "0x01e0",
> +        "EventName": "CSDB_STALL",
> +        "PublicDescription": "Rename stalled due to CSDB."
> +    },
> +    {
> +        "EventCode": "0x01e2",
> +        "EventName": "STALL_SLOT_FRONTEND_WITHOUT_MISPRED",
> +        "PublicDescription": "Stall slot frontend during non-mispredicte=
d branch.\nThis event counts the STALL_STOT_FRONTEND Events, except for the=
 4 cycles following a mispredicted branch Event or 4 cycles following a com=
mit flush&restart Event."
> +    }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json b/tool=
s/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json
> new file mode 100644
> index 000000000000..18ec5c348c87
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json
> @@ -0,0 +1,158 @@
> +[
> +    {
> +        "ArchStdEvent": "L1I_TLB_REFILL",
> +        "PublicDescription": "This event counts L1 Instruction TLB refil=
ls from any instruction fetch (demand, hardware prefetch, and software prel=
oad accesses). If there are multiple misses in the TLB that are resolved by=
 the refill, then this event only counts once. This event will not count if=
 the translation table walk results in a fault (such as a translation or ac=
cess fault), since there is no new translation created for the TLB."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_REFILL",
> +        "PublicDescription": "This event counts L1 Data TLB accesses tha=
t resulted in TLB refills. If there are multiple misses in the TLB that are=
 resolved by the refill, then this event only counts once. This event count=
s for refills caused by preload instructions or hardware prefetch accesses.=
 This event counts regardless of whether the miss hits in L2 or results in =
a translation table walk. This event will not count if the translation tabl=
e walk results in a fault (such as a translation or access fault), since th=
ere is no new translation created for the TLB. This event will not count on=
 an access from an AT (Address Translation) instruction.\nThis event counts=
 the sum of the following events:\nL1D_TLB_REFILL_RD and\nL1D_TLB_REFILL_WR=
=2E"
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB",
> +        "PublicDescription": "This event counts L1 Data TLB accesses cau=
sed by any memory Load or Store operation.\nNote that Load or Store instruc=
tions can be broken up into multiple memory operations.\nThis event does no=
t count TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_TLB",
> +        "PublicDescription": "This event counts L1 instruction TLB acces=
ses (caused by demand or hardware prefetch or software preload accesses), w=
hether the access hits or misses in the TLB. This event counts both demand =
accesses and prefetch or preload generated accesses.\nThis event is a super=
set of the L1I_TLB_REFILL event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_TLB_REFILL",
> +        "PublicDescription": "This event counts L2 TLB refills caused by=
 memory operations from both data and instruction fetch, except for those c=
aused by TLB maintenance operations and hardware prefetches.\nThis event is=
 the sum of the following events:\nL2D_TLB_REFILL_RD and\nL2D_TLB_REFILL_WR=
=2E"
> +    },
> +    {
> +        "ArchStdEvent": "L2D_TLB",
> +        "PublicDescription": "This event counts L2 TLB accesses except t=
hose caused by TLB maintenance operations.\nThis event is the sum of the fo=
llowing events:\nL2D_TLB_RD and\nL2D_TLB_WR."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_WALK",
> +        "PublicDescription": "This event counts number of demand data tr=
anslation table walks caused by a miss in the L2 TLB and performing at leas=
t one memory access. Translation table walks are counted even if the transl=
ation ended up taking a translation fault for reasons different than EPD, E=
0PD and NFD. Note that partial translations that cause a translation table =
walk are also counted. Also note that this event counts walks triggered by =
software preloads, but not walks triggered by hardware prefetchers, and tha=
t this event does not count walks triggered by TLB maintenance operations.\=
nThis event does not include prefetches."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_WALK",
> +        "PublicDescription": "This event counts number of instruction tr=
anslation table walks caused by a miss in the L2 TLB and performing at leas=
t one memory access. Translation table walks are counted even if the transl=
ation ended up taking a translation fault for reasons different than EPD, E=
0PD and NFD. Note that partial translations that cause a translation table =
walk are also counted. Also note that this event does not count walks trigg=
ered by TLB maintenance operations.\nThis event does not include prefetches=
=2E"
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_REFILL_RD",
> +        "PublicDescription": "This event counts L1 Data TLB refills caus=
ed by memory Read operations. If there are multiple misses in the TLB that =
are resolved by the refill, then this event only counts once. This event co=
unts for refills caused by preload instructions or hardware prefetch access=
es. This event counts regardless of whether the miss hits in L2 or results =
in a translation table walk. This event will not count if the translation t=
able walk results in a fault (such as a translation or access fault), since=
 there is no new translation created for the TLB. This event will not count=
 on an access from an Address Translation (AT) instruction.\nThis event is =
a subset of the L1D_TLB_REFILL event."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_REFILL_WR",
> +        "PublicDescription": "This event counts L1 Data TLB refills caus=
ed by data side memory Write operations. If there are multiple misses in th=
e TLB that are resolved by the refill, then this event only counts once. Th=
is event counts for refills caused by preload instructions or hardware pref=
etch accesses. This event counts regardless of whether the miss hits in L2 =
or results in a translation table walk. This event will not count if the ta=
ble walk results in a fault (such as a translation or access fault), since =
there is no new translation created for the TLB. This event will not count =
with an access from an Address Translation (AT) instruction.\nThis event is=
 a subset of the L1D_TLB_REFILL event."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_RD",
> +        "PublicDescription": "This event counts L1 Data TLB accesses cau=
sed by memory Read operations. This event counts whether the access hits or=
 misses in the TLB. This event does not count TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_WR",
> +        "PublicDescription": "This event counts any L1 Data side TLB acc=
esses caused by memory Write operations. This event counts whether the acce=
ss hits or misses in the TLB. This event does not count TLB maintenance ope=
rations."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_TLB_REFILL_RD",
> +        "PublicDescription": "This event counts L2 TLB refills caused by=
 memory Read operations from both data and instruction fetch except for tho=
se caused by TLB maintenance operations or hardware prefetches.\nThis event=
 is a subset of the L2D_TLB_REFILL event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_TLB_REFILL_WR",
> +        "PublicDescription": "This event counts L2 TLB refills caused by=
 memory Write operations from both data and instruction fetch except for th=
ose caused by TLB maintenance operations.\nThis event is a subset of the L2=
D_TLB_REFILL event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_TLB_RD",
> +        "PublicDescription": "This event counts L2 TLB accesses caused b=
y memory Read operations from both data and instruction fetch except for th=
ose caused by TLB maintenance operations.\nThis event is a subset of the L2=
D_TLB event."
> +    },
> +    {
> +        "ArchStdEvent": "L2D_TLB_WR",
> +        "PublicDescription": "This event counts L2 TLB accesses caused b=
y memory Write operations from both data and instruction fetch except for t=
hose caused by TLB maintenance operations.\nThis event is a subset of the L=
2D_TLB event."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_WALK_PERCYC",
> +        "PublicDescription": "This event counts the number of data trans=
lation table walks in progress per cycle."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_WALK_PERCYC",
> +        "PublicDescription": "This event counts the number of instructio=
n translation table walks in progress per cycle."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_RW",
> +        "PublicDescription": "This event counts L1 Data TLB demand acces=
ses caused by memory Read or Write operations. This event counts whether th=
e access hits or misses in the TLB. This event does not count TLB maintenan=
ce operations."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_TLB_RD",
> +        "PublicDescription": "This event counts L1 Instruction TLB deman=
d accesses whether the access hits or misses in the TLB."
> +    },
> +    {
> +        "ArchStdEvent": "L1D_TLB_PRFM",
> +        "PublicDescription": "This event counts L1 Data TLB accesses gen=
erated by software prefetch or preload memory accesses. Load or Store instr=
uctions can be broken into multiple memory operations. This event does not =
count TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "L1I_TLB_PRFM",
> +        "PublicDescription": "This event counts L1 Instruction TLB acces=
ses generated by software preload or prefetch instructions. This event coun=
ts whether the access hits or misses in the TLB. This event does not count =
TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_HWUPD",
> +        "PublicDescription": "This event counts number of memory accesse=
s triggered by a data translation table walk and performing an update of a =
translation table entry. Memory accesses are counted even if the translatio=
n ended up taking a translation fault for reasons different than EPD, E0PD =
and NFD. Note that this event counts accesses triggered by software preload=
s, but not accesses triggered by hardware prefetchers."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_HWUPD",
> +        "PublicDescription": "This event counts number of memory accesse=
s triggered by an instruction translation table walk and performing an upda=
te of a translation table entry. Memory accesses are counted even if the tr=
anslation ended up taking a translation fault for reasons different than EP=
D, E0PD and NFD."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_STEP",
> +        "PublicDescription": "This event counts number of memory accesse=
s triggered by a demand data translation table walk and performing a Read o=
f a translation table entry. Memory accesses are counted even if the transl=
ation ended up taking a translation fault for reasons different than EPD, E=
0PD and NFD.\nNote that this event counts accesses triggered by software pr=
eloads, but not accesses triggered by hardware prefetchers."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_STEP",
> +        "PublicDescription": "This event counts number of memory accesse=
s triggered by an instruction translation table walk and performing a Read =
of a translation table entry. Memory accesses are counted even if the trans=
lation ended up taking a translation fault for reasons different than EPD, =
E0PD and NFD."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_WALK_LARGE",
> +        "PublicDescription": "This event counts number of demand data tr=
anslation table walks caused by a miss in the L2 TLB and yielding a large p=
age. The set of large pages is defined as all pages with a final size highe=
r than or equal to 2MB. Translation table walks that end up taking a transl=
ation fault are not counted, as the page size would be undefined in that ca=
se. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event i=
n this family.\nNote that partial translations that cause a translation tab=
le walk are also counted.\nAlso note that this event counts walks triggered=
 by software preloads, but not walks triggered by hardware prefetchers, and=
 that this event does not count walks triggered by TLB maintenance operatio=
ns."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_WALK_LARGE",
> +        "PublicDescription": "This event counts number of instruction tr=
anslation table walks caused by a miss in the L2 TLB and yielding a large p=
age. The set of large pages is defined as all pages with a final size highe=
r than or equal to 2MB. Translation table walks that end up taking a transl=
ation fault are not counted, as the page size would be undefined in that ca=
se. In this family, this is equal to ITLB_WALK_BLOCK event.\nNote that part=
ial translations that cause a translation table walk are also counted.\nAls=
o note that this event does not count walks triggered by TLB maintenance op=
erations."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_WALK_SMALL",
> +        "PublicDescription": "This event counts number of data translati=
on table walks caused by a miss in the L2 TLB and yielding a small page. Th=
e set of small pages is defined as all pages with a final size lower than 2=
MB. Translation table walks that end up taking a translation fault are not =
counted, as the page size would be undefined in that case. If DTLB_WALK_PAG=
E event is implemented, then it is an alias for this event in this family. =
Note that partial translations that cause a translation table walk are also=
 counted.\nAlso note that this event counts walks triggered by software pre=
loads, but not walks triggered by hardware prefetchers, and that this event=
 does not count walks triggered by TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_WALK_SMALL",
> +        "PublicDescription": "This event counts number of instruction tr=
anslation table walks caused by a miss in the L2 TLB and yielding a small p=
age. The set of small pages is defined as all pages with a final size lower=
 than 2MB. Translation table walks that end up taking a translation fault a=
re not counted, as the page size would be undefined in that case. In this f=
amily, this is equal to ITLB_WALK_PAGE event.\nNote that partial translatio=
ns that cause a translation table walk are also counted.\nAlso note that th=
is event does not count walks triggered by TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_WALK_RW",
> +        "PublicDescription": "This event counts number of demand data tr=
anslation table walks caused by a miss in the L2 TLB and performing at leas=
t one memory access. Translation table walks are counted even if the transl=
ation ended up taking a translation fault for reasons different than EPD, E=
0PD and NFD.\nNote that partial translations that cause a translation table=
 walk are also counted.\nAlso note that this event does not count walks tri=
ggered by TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_WALK_RD",
> +        "PublicDescription": "This event counts number of demand instruc=
tion translation table walks caused by a miss in the L2 TLB and performing =
at least one memory access. Translation table walks are counted even if the=
 translation ended up taking a translation fault for reasons different than=
 EPD, E0PD and NFD.\nNote that partial translations that cause a translatio=
n table walk are also counted.\nAlso note that this event does not count wa=
lks triggered by TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "DTLB_WALK_PRFM",
> +        "PublicDescription": "This event counts number of software prefe=
tches or preloads generated data translation table walks caused by a miss i=
n the L2 TLB and performing at least one memory access. Translation table w=
alks are counted even if the translation ended up taking a translation faul=
t for reasons different than EPD, E0PD and NFD.\nNote that partial translat=
ions that cause a translation table walk are also counted.\nAlso note that =
this event does not count walks triggered by TLB maintenance operations."
> +    },
> +    {
> +        "ArchStdEvent": "ITLB_WALK_PRFM",
> +        "PublicDescription": "This event counts number of software prefe=
tches or preloads generated instruction translation table walks caused by a=
 miss in the L2 TLB and performing at least one memory access. Translation =
table walks are counted even if the translation ended up taking a translati=
on fault for reasons different than EPD, E0PD and NFD.\nNote that partial t=
ranslations that cause a translation table walk are also counted.\nAlso not=
e that this event does not count walks triggered by TLB maintenance operati=
ons."
> +    },
> +    {
> +        "EventCode": "0x010e",
> +        "EventName": "L1D_TLB_REFILL_RD_PF",
> +        "PublicDescription": "L1 Data TLB refill, Read, prefetch."
> +    },
> +    {
> +        "EventCode": "0x010f",
> +        "EventName": "L2TLB_PF_REFILL",
> +        "PublicDescription": "L2 Data TLB refill, Read, prefetch.\nThis =
event counts MMU refills due to internal PFStream requests."
> +    },
> +    {
> +        "EventCode": "0x0223",
> +        "EventName": "L1I_TLB_REFILL_RD",
> +        "PublicDescription": "L1 Instruction TLB refills due to Demand m=
iss."
> +    },
> +    {
> +        "EventCode": "0x0224",
> +        "EventName": "L1I_TLB_REFILL_PRFM",
> +        "PublicDescription": "L1 Instruction TLB refills due to Software=
 prefetch miss."
> +    }
> +]
> --=20
> 2.43.0
>=20