From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B84F33054E4; Wed, 4 Mar 2026 01:26:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772587613; cv=none; b=aTHUD83GPCCGIIEo4bcMKeai1iXpYpQYNa4DTU2LyEq+IqL9ZIlmTL87kpPgvQORFBPYe6iZv3q8ySeWSf46uTrNbmMO5dhwChbqfWi7Co/aL9O2+VOfzJ5Gur3ENo7Yx2LAVVZFWfEdtEEJS+Inop6D/Kv5bS5ozg/rP3ik4F4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772587613; c=relaxed/simple; bh=umUTBDj7oH8nBYAQdFQbOEBxTv+W6TccI8sKEvYA/vk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gKhqH1ycMLxIM6z+9xltdqcn8yTcZ7/d62Av5/nIvZjaplG2jdnoPe9iKDDsm/Z7ABViAWFzBmHClK0tdC2b8+rhhZMoWAvbUn0vkV+0dney7KXiBdnFw3bemGCGLwb5gWw21ZSi3/xbCbkporF+yoVlVMAh8Rx0j7Yrp8A+OGU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PHGPgPrn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PHGPgPrn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3513AC116C6; Wed, 4 Mar 2026 01:26:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772587613; bh=umUTBDj7oH8nBYAQdFQbOEBxTv+W6TccI8sKEvYA/vk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PHGPgPrnDkdkOEWtplx89lTG3y5tUR/a8aKpQARzKMOdPkeN/Xin7wQMisLY3xxhU /JIayYB18LN+HsP5c8GsCLSbKAJGbRPvyWIZ0TFp9oOf+ikWARi4lWLniQ4xJm7iaC pXupNOhytKaUcjE1Pekk/wAmyTFdeYfEOugEi9r9uKrYAxgBPa93OYGzBkQk9JAb0K kt77MChbA8axdKibN0C53Jst2XnqocACAGcMImCDzsmLBYhDvZ8EZHzTMBPvAjw3HS P0gyeVk80fwWlWXVa+AiujKD+9XEYpbxQSCsIt4ZAMePyAfSKaON4+O3RoOdqzgmjL bRZxzDfLkfOMQ== Date: Tue, 3 Mar 2026 17:26:50 -0800 From: Namhyung Kim To: Besar Wicaksono Cc: irogers@google.com, james.clark@linaro.org, john.g.garry@oracle.com, will@kernel.org, mike.leach@linaro.org, leo.yan@linux.dev, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, adrian.hunter@intel.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, linux-tegra@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, tmakin@nvidia.com, vsethi@nvidia.com, rwiley@nvidia.com, skelley@nvidia.com, ywan@nvidia.com, treding@nvidia.com, jonathanh@nvidia.com, mochs@nvidia.com Subject: Re: [PATCH v2] perf vendor events arm64: Add Tegra410 Olympus PMU events Message-ID: References: <20260212233407.1432673-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260212233407.1432673-1-bwicaksono@nvidia.com> Hello, On Thu, Feb 12, 2026 at 11:34:07PM +0000, Besar Wicaksono wrote: > Add JSON files for NVIDIA Tegra410 Olympus core PMU events. > Also updated the common-and-microarch.json. >=20 > Signed-off-by: Besar Wicaksono > --- >=20 > Changes from v1: > * Remove CHAIN event > * Update event description and fix spelling and capitalization mistakes > Thanks to Ian and James for the review. > v1: https://lore.kernel.org/all/20260127225909.3296202-1-bwicaksono@nvidi= a.com/T/#u Ian and James, can you please take a look again? Thanks, Namhyung >=20 > --- > .../arch/arm64/common-and-microarch.json | 85 +++ > tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 + > .../arch/arm64/nvidia/t410/branch.json | 45 ++ > .../arch/arm64/nvidia/t410/brbe.json | 6 + > .../arch/arm64/nvidia/t410/bus.json | 48 ++ > .../arch/arm64/nvidia/t410/exception.json | 62 ++ > .../arch/arm64/nvidia/t410/fp_operation.json | 78 ++ > .../arch/arm64/nvidia/t410/general.json | 15 + > .../arch/arm64/nvidia/t410/l1d_cache.json | 122 +++ > .../arch/arm64/nvidia/t410/l1i_cache.json | 114 +++ > .../arch/arm64/nvidia/t410/l2d_cache.json | 134 ++++ > .../arch/arm64/nvidia/t410/ll_cache.json | 107 +++ > .../arch/arm64/nvidia/t410/memory.json | 46 ++ > .../arch/arm64/nvidia/t410/metrics.json | 722 ++++++++++++++++++ > .../arch/arm64/nvidia/t410/misc.json | 642 ++++++++++++++++ > .../arch/arm64/nvidia/t410/retired.json | 94 +++ > .../arch/arm64/nvidia/t410/spe.json | 42 + > .../arm64/nvidia/t410/spec_operation.json | 230 ++++++ > .../arch/arm64/nvidia/t410/stall.json | 145 ++++ > .../arch/arm64/nvidia/t410/tlb.json | 158 ++++ > 20 files changed, 2896 insertions(+) > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.j= son > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/exceptio= n.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_opera= tion.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/general.= json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cach= e.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cach= e.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cach= e.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache= =2Ejson > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.j= son > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.= json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.= json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_ope= ration.json > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.js= on > create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json >=20 > diff --git a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json b= /tools/perf/pmu-events/arch/arm64/common-and-microarch.json > index 468cb085d879..144325d87be4 100644 > --- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json > +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json > @@ -1512,11 +1512,26 @@ > "EventName": "L2D_CACHE_REFILL_PRFM", > "BriefDescription": "Level 2 data cache refill, software preload" > }, > + { > + "EventCode": "0x8150", > + "EventName": "L3D_CACHE_RW", > + "BriefDescription": "Level 3 data cache demand access." > + }, > + { > + "EventCode": "0x8151", > + "EventName": "L3D_CACHE_PRFM", > + "BriefDescription": "Level 3 data cache software prefetch" > + }, > { > "EventCode": "0x8152", > "EventName": "L3D_CACHE_MISS", > "BriefDescription": "Level 3 data cache demand access miss" > }, > + { > + "EventCode": "0x8153", > + "EventName": "L3D_CACHE_REFILL_PRFM", > + "BriefDescription": "Level 3 data cache refill, software prefetc= h." > + }, > { > "EventCode": "0x8154", > "EventName": "L1D_CACHE_HWPRF", > @@ -1527,6 +1542,11 @@ > "EventName": "L2D_CACHE_HWPRF", > "BriefDescription": "Level 2 data cache hardware prefetch." > }, > + { > + "EventCode": "0x8156", > + "EventName": "L3D_CACHE_HWPRF", > + "BriefDescription": "Level 3 data cache hardware prefetch." > + }, > { > "EventCode": "0x8158", > "EventName": "STALL_FRONTEND_MEMBOUND", > @@ -1682,6 +1702,11 @@ > "EventName": "L2D_CACHE_REFILL_HWPRF", > "BriefDescription": "Level 2 data cache refill, hardware prefetc= h." > }, > + { > + "EventCode": "0x81BE", > + "EventName": "L3D_CACHE_REFILL_HWPRF", > + "BriefDescription": "Level 3 data cache refill, hardware prefetc= h." > + }, > { > "EventCode": "0x81C0", > "EventName": "L1I_CACHE_HIT_RD", > @@ -1712,11 +1737,31 @@ > "EventName": "L1I_CACHE_HIT_RD_FPRFM", > "BriefDescription": "Level 1 instruction cache demand fetch firs= t hit, fetched by software preload" > }, > + { > + "EventCode": "0x81DC", > + "EventName": "L1D_CACHE_HIT_RW_FPRFM", > + "BriefDescription": "Level 1 data cache demand access first hit,= fetched by software prefetch." > + }, > { > "EventCode": "0x81E0", > "EventName": "L1I_CACHE_HIT_RD_FHWPRF", > "BriefDescription": "Level 1 instruction cache demand fetch firs= t hit, fetched by hardware prefetcher" > }, > + { > + "EventCode": "0x81EC", > + "EventName": "L1D_CACHE_HIT_RW_FHWPRF", > + "BriefDescription": "Level 1 data cache demand access first hit,= fetched by hardware prefetcher." > + }, > + { > + "EventCode": "0x81F0", > + "EventName": "L1I_CACHE_HIT_RD_FPRF", > + "BriefDescription": "Level 1 instruction cache demand fetch firs= t hit, fetched by prefetch." > + }, > + { > + "EventCode": "0x81FC", > + "EventName": "L1D_CACHE_HIT_RW_FPRF", > + "BriefDescription": "Level 1 data cache demand access first hit,= fetched by prefetch." > + }, > { > "EventCode": "0x8200", > "EventName": "L1I_CACHE_HIT", > @@ -1767,11 +1812,26 @@ > "EventName": "L1I_LFB_HIT_RD_FPRFM", > "BriefDescription": "Level 1 instruction cache demand fetch line= -fill buffer first hit, recently fetched by software preload" > }, > + { > + "EventCode": "0x825C", > + "EventName": "L1D_LFB_HIT_RW_FPRFM", > + "BriefDescription": "Level 1 data cache demand access line-fill = buffer first hit, recently fetched by software prefetch." > + }, > { > "EventCode": "0x8260", > "EventName": "L1I_LFB_HIT_RD_FHWPRF", > "BriefDescription": "Level 1 instruction cache demand fetch line= -fill buffer first hit, recently fetched by hardware prefetcher" > }, > + { > + "EventCode": "0x826C", > + "EventName": "L1D_LFB_HIT_RW_FHWPRF", > + "BriefDescription": "Level 1 data cache demand access line-fill = buffer first hit, recently fetched by hardware prefetcher." > + }, > + { > + "EventCode": "0x827C", > + "EventName": "L1D_LFB_HIT_RW_FPRF", > + "BriefDescription": "Level 1 data cache demand access line-fill = buffer first hit, recently fetched by prefetch." > + }, > { > "EventCode": "0x8280", > "EventName": "L1I_CACHE_PRF", > @@ -1807,6 +1867,11 @@ > "EventName": "LL_CACHE_REFILL", > "BriefDescription": "Last level cache refill" > }, > + { > + "EventCode": "0x828E", > + "EventName": "L3D_CACHE_REFILL_PRF", > + "BriefDescription": "Level 3 data cache refill, prefetch." > + }, > { > "EventCode": "0x8320", > "EventName": "L1D_CACHE_REFILL_PERCYC", > @@ -1872,6 +1937,16 @@ > "EventName": "FP_FP8_MIN_SPEC", > "BriefDescription": "Floating-point operation speculatively_exec= uted, smallest type is 8-bit floating-point." > }, > + { > + "EventCode": "0x8480", > + "EventName": "FP_SP_FIXED_MIN_OPS_SPEC", > + "BriefDescription": "Non-scalable element arithmetic operations = speculatively executed, smallest type is single-precision floating-point." > + }, > + { > + "EventCode": "0x8482", > + "EventName": "FP_HP_FIXED_MIN_OPS_SPEC", > + "BriefDescription": "Non-scalable element arithmetic operations = speculatively executed, smallest type is half-precision floating-point." > + }, > { > "EventCode": "0x8483", > "EventName": "FP_BF16_FIXED_MIN_OPS_SPEC", > @@ -1882,6 +1957,16 @@ > "EventName": "FP_FP8_FIXED_MIN_OPS_SPEC", > "BriefDescription": "Non-scalable element arithmetic operations = speculatively executed, smallest type is 8-bit floating-point." > }, > + { > + "EventCode": "0x8488", > + "EventName": "FP_SP_SCALE_MIN_OPS_SPEC", > + "BriefDescription": "Scalable element arithmetic operations spec= ulatively executed, smallest type is single-precision floating-point." > + }, > + { > + "EventCode": "0x848A", > + "EventName": "FP_HP_SCALE_MIN_OPS_SPEC", > + "BriefDescription": "Scalable element arithmetic operations spec= ulatively executed, smallest type is half-precision floating-point." > + }, > { > "EventCode": "0x848B", > "EventName": "FP_BF16_SCALE_MIN_OPS_SPEC", > diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pm= u-events/arch/arm64/mapfile.csv > index bb3fa8a33496..7f0eaa702048 100644 > --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv > +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv > @@ -46,3 +46,4 @@ > 0x00000000500f0000,v1,ampere/emag,core > 0x00000000c00fac30,v1,ampere/ampereone,core > 0x00000000c00fac40,v1,ampere/ampereonex,core > +0x000000004e0f0100,v1,nvidia/t410,core > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json b/t= ools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json > new file mode 100644 > index 000000000000..ef4effc00ec3 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json > @@ -0,0 +1,45 @@ > +[ > + { > + "ArchStdEvent": "BR_MIS_PRED", > + "PublicDescription": "This event counts branches which are specu= latively executed and mispredicted." > + }, > + { > + "ArchStdEvent": "BR_PRED", > + "PublicDescription": "This event counts all speculatively execut= ed branches." > + }, > + { > + "EventCode": "0x017e", > + "EventName": "BR_PRED_BTB_CTX_UPDATE", > + "PublicDescription": "Branch context table update." > + }, > + { > + "EventCode": "0x0188", > + "EventName": "BR_MIS_PRED_DIR_RESOLVED", > + "PublicDescription": "Number of branch misprediction due to dire= ction misprediction." > + }, > + { > + "EventCode": "0x0189", > + "EventName": "BR_MIS_PRED_DIR_UNCOND_RESOLVED", > + "PublicDescription": "Number of branch misprediction due to dire= ction misprediction for unconditional branches." > + }, > + { > + "EventCode": "0x018a", > + "EventName": "BR_MIS_PRED_DIR_UNCOND_DIRECT_RESOLVED", > + "PublicDescription": "Number of branch misprediction due to dire= ction misprediction for unconditional direct branches." > + }, > + { > + "EventCode": "0x018b", > + "EventName": "BR_PRED_MULTI_RESOLVED", > + "PublicDescription": "Number of resolved branch which made predi= ction by polymorphic indirect predictor." > + }, > + { > + "EventCode": "0x018c", > + "EventName": "BR_MIS_PRED_MULTI_RESOLVED", > + "PublicDescription": "Number of branch misprediction which made = prediction by polymorphic indirect predictor." > + }, > + { > + "EventCode": "0x01e4", > + "EventName": "BR_RGN_RECLAIM", > + "PublicDescription": "This event counts the Indirect predictor e= ntries flushed by region reclamation." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json b/too= ls/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json > new file mode 100644 > index 000000000000..9c315b2d7046 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json > @@ -0,0 +1,6 @@ > +[ > + { > + "ArchStdEvent": "BRB_FILTRATE", > + "PublicDescription": "This event counts each valid branch record= captured in the branch record buffer. Branch records that are not captured= because they are removed by filtering are not counted." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json b/tool= s/perf/pmu-events/arch/arm64/nvidia/t410/bus.json > new file mode 100644 > index 000000000000..5bb8de617c68 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json > @@ -0,0 +1,48 @@ > +[ > + { > + "ArchStdEvent": "BUS_ACCESS", > + "PublicDescription": "This event counts the number of data-beat = accesses between the CPU and the external bus. This count includes accesses= due to read, write, and snoop. Each beat of data is counted individually." > + }, > + { > + "ArchStdEvent": "BUS_CYCLES", > + "PublicDescription": "This event counts bus cycles in the CPU. B= us cycles represent a clock cycle in which a transaction could be sent or r= eceived on the interface from the CPU to the external bus. Since that inter= face is driven at the same clock speed as the CPU, this event increments at= the rate of CPU clock. Regardless of the WFE/WFI state of the PE, this eve= nt increments on each processor clock." > + }, > + { > + "ArchStdEvent": "BUS_ACCESS_RD", > + "PublicDescription": "This event counts memory Read transactions= seen on the external bus. Each beat of data is counted individually." > + }, > + { > + "ArchStdEvent": "BUS_ACCESS_WR", > + "PublicDescription": "This event counts memory Write transaction= s seen on the external bus. Each beat of data is counted individually." > + }, > + { > + "EventCode": "0x0154", > + "EventName": "BUS_REQUEST_REQ", > + "PublicDescription": "Bus request, request." > + }, > + { > + "EventCode": "0x0155", > + "EventName": "BUS_REQUEST_RETRY", > + "PublicDescription": "Bus request, retry." > + }, > + { > + "EventCode": "0x0198", > + "EventName": "L2_CHI_CBUSY0", > + "PublicDescription": "Number of RXDAT or RXRSP response received= width CBusy of 0." > + }, > + { > + "EventCode": "0x0199", > + "EventName": "L2_CHI_CBUSY1", > + "PublicDescription": "Number of RXDAT or RXRSP response received= width CBusy of 1." > + }, > + { > + "EventCode": "0x019a", > + "EventName": "L2_CHI_CBUSY2", > + "PublicDescription": "Number of RXDAT or RXRSP response received= width CBusy of 2." > + }, > + { > + "EventCode": "0x019b", > + "EventName": "L2_CHI_CBUSY3", > + "PublicDescription": "Number of RXDAT or RXRSP response received= width CBusy of 3." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json = b/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json > new file mode 100644 > index 000000000000..ecd996c3610b > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json > @@ -0,0 +1,62 @@ > +[ > + { > + "ArchStdEvent": "EXC_TAKEN", > + "PublicDescription": "This event counts any taken architecturall= y visible exceptions such as IRQ, FIQ, SError, and other synchronous except= ions. Exceptions are counted whether or not they are taken locally." > + }, > + { > + "ArchStdEvent": "EXC_RETURN", > + "PublicDescription": "This event counts any architecturally exec= uted exception return instructions. For example: AArch64: ERET." > + }, > + { > + "ArchStdEvent": "EXC_UNDEF", > + "PublicDescription": "This event counts the number of synchronou= s exceptions which are taken locally that are due to attempting to execute = an instruction that is UNDEFINED.\nAttempting to execute instruction bit pa= tterns that have not been allocated.\nAttempting to execute instructions wh= en they are disabled.\nAttempting to execute instructions at an inappropria= te Exception level.\nAttempting to execute an instruction when the value of= PSTATE.IL is 1." > + }, > + { > + "ArchStdEvent": "EXC_SVC", > + "PublicDescription": "This event counts SVC exceptions taken loc= ally." > + }, > + { > + "ArchStdEvent": "EXC_PABORT", > + "PublicDescription": "This event counts synchronous exceptions t= hat are taken locally and caused by Instruction Aborts." > + }, > + { > + "ArchStdEvent": "EXC_DABORT", > + "PublicDescription": "This event counts exceptions that are take= n locally and are caused by data aborts or SErrors. Conditions that could c= ause those exceptions are attempting to read or write memory where the MMU = generates a fault, attempting to read or write memory with a misaligned add= ress, Interrupts from the nSEI inputs and internally generated SErrors." > + }, > + { > + "ArchStdEvent": "EXC_IRQ", > + "PublicDescription": "This event counts IRQ exceptions including= the virtual IRQs that are taken locally." > + }, > + { > + "ArchStdEvent": "EXC_FIQ", > + "PublicDescription": "This event counts FIQ exceptions including= the virtual FIQs that are taken locally." > + }, > + { > + "ArchStdEvent": "EXC_SMC", > + "PublicDescription": "This event counts SMC exceptions taken to = EL3." > + }, > + { > + "ArchStdEvent": "EXC_HVC", > + "PublicDescription": "This event counts HVC exceptions taken to = EL2." > + }, > + { > + "ArchStdEvent": "EXC_TRAP_PABORT", > + "PublicDescription": "This event counts exceptions which are tra= ps not taken locally and are caused by Instruction Aborts. For example, att= empting to execute an instruction with a misaligned PC." > + }, > + { > + "ArchStdEvent": "EXC_TRAP_DABORT", > + "PublicDescription": "This event counts exceptions which are tra= ps not taken locally and are caused by Data Aborts or SError Interrupts. Co= nditions that could cause those exceptions are:\n* Attempting to read or wr= ite memory where the MMU generates a fault,\n* Attempting to read or write = memory with a misaligned address,\n* Interrupts from the SEI input,\n* Inte= rnally generated SErrors." > + }, > + { > + "ArchStdEvent": "EXC_TRAP_OTHER", > + "PublicDescription": "This event counts the number of synchronou= s trap exceptions which are not taken locally and are not SVC, SMC, HVC, Da= ta Aborts, Instruction Aborts, or Interrupts." > + }, > + { > + "ArchStdEvent": "EXC_TRAP_IRQ", > + "PublicDescription": "This event counts IRQ exceptions including= the virtual IRQs that are not taken locally." > + }, > + { > + "ArchStdEvent": "EXC_TRAP_FIQ", > + "PublicDescription": "This event counts FIQs which are not taken= locally but taken from EL0, EL1, or EL2 to EL3 (which would be the normal = behavior for FIQs when not executing in EL3)." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.js= on b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json > new file mode 100644 > index 000000000000..3588e130781d > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json > @@ -0,0 +1,78 @@ > +[ > + { > + "ArchStdEvent": "FP_HP_SPEC", > + "PublicDescription": "This event counts speculatively executed h= alf precision floating point operations." > + }, > + { > + "ArchStdEvent": "FP_SP_SPEC", > + "PublicDescription": "This event counts speculatively executed s= ingle precision floating point operations." > + }, > + { > + "ArchStdEvent": "FP_DP_SPEC", > + "PublicDescription": "This event counts speculatively executed d= ouble precision floating point operations." > + }, > + { > + "ArchStdEvent": "FP_SCALE_OPS_SPEC", > + "PublicDescription": "This event counts speculatively executed s= calable single precision floating point operations." > + }, > + { > + "ArchStdEvent": "FP_FIXED_OPS_SPEC", > + "PublicDescription": "This event counts speculatively executed n= on-scalable single precision floating point operations." > + }, > + { > + "ArchStdEvent": "FP_HP_SCALE_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the largest type was half-precision floating-point, where v is a v= alue such that (v*(VL/128)) is the number of arithmetic operations carried = out by the operation or instruction which causes the counter to increment.\= nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC= or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_HP_FIXED_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the largest type was half-precision floating-point, where v is= the number of arithmetic operations carried out by the operation or which = instruction causes the event to increment.\nThis event does not count opera= tions that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_SP_SCALE_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the largest type was single-precision floating-point, where v is a= value such that (v*(VL/128)) is the number of arithmetic operations carrie= d out by the operation or instruction which causes the event to increment.\= nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC= or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_SP_FIXED_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the largest type was single-precision floating-point, where v = is the number of arithmetic operations carried out by the operation or inst= ruction which causes the event to increment.\nThis event does not count ope= rations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_DP_SCALE_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the largest type was double-precision floating-point, where v is a= value such that (v*(VL/128)) is the number of arithmetic operations carrie= d out by the operation or instruction which causes the event to increment.\= nThis event does not count operations that are counted by FP_FIXED_OPS_SPEC= or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_DP_FIXED_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the largest type was double-precision floating-point, where v = is the number of arithmetic operations carried out by the operation or inst= ruction which causes the event to increment.\nThis event does not count ope= rations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_SP_FIXED_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the smallest type was single-precision floating-point, where v= is the number of arithmetic operations carried out by the operation or ins= truction which causes the event to increment.\nThis event does not count op= erations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_HP_FIXED_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the smallest type was half-precision floating-point, where v i= s the number of arithmetic operations carried out by the operation or instr= uction which causes the event to increment.\nThis event does not count oper= ations that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_BF16_FIXED_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the smallest type was BFloat16 floating-point. Where v is the = number of arithmetic operations carried out by the operation or instruction= which causes the event to increment. This event does not count operations = that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_FP8_FIXED_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed non-scalable element arithmetic operation, due to an instr= uction where the smallest type was 8-bit floating-point, where v is the num= ber of arithmetic operations carried out by the operation or instruction wh= ich causes the event to increment.\nThis event does not count operations th= at are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_SP_SCALE_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the smallest type was single-precision floating-point, where v is = a value such that (v*(VL/128)) is the number of arithmetic operations carri= ed out by the operation or instruction which causes the event to increment.= \nThis event does not count operations that are counted by FP_FIXED_OPS_SPE= C or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_HP_SCALE_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the smallest type was half-precision floating-point, where v is a = value such that (v*(VL/128)) is the number of arithmetic operations carried= out by the operation or instruction which causes the event to increment.\n= This event does not count operations that are counted by FP_FIXED_OPS_SPEC = or FP_SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_BF16_SCALE_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the smallest type was BFloat16 floating-point, where v is a value = such that (v*(VL/128)) is the number of arithmetic operations carried out b= y the operation or instruction which causes the event to increment.\nThis e= vent does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_= SCALE2_OPS_SPEC." > + }, > + { > + "ArchStdEvent": "FP_FP8_SCALE_MIN_OPS_SPEC", > + "PublicDescription": "This event increments by v for each specul= atively executed scalable element arithmetic operation, due to an instructi= on where the smallest type was 8-bit floating-point, where v is a value suc= h that (v*(VL/128)) is the number of arithmetic operations carried out by t= he operation or instruction which causes the event to increment.\nThis even= t does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCA= LE2_OPS_SPEC." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json > new file mode 100644 > index 000000000000..bd9c248387aa > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json > @@ -0,0 +1,15 @@ > +[ > + { > + "ArchStdEvent": "CPU_CYCLES", > + "PublicDescription": "This event counts CPU clock cycles when th= e PE is not in WFE/WFI. The clock measured by this event is defined as the = physical clock driving the CPU logic." > + }, > + { > + "ArchStdEvent": "CNT_CYCLES", > + "PublicDescription": "This event increments at a constant freque= ncy equal to the rate of increment of the System Counter, CNTPCT_EL0.\nThis= event does not increment when the PE is in WFE/WFI." > + }, > + { > + "EventCode": "0x01e1", > + "EventName": "CPU_SLOT", > + "PublicDescription": "Entitled CPU slots.\nThis event counts the= number of slots. When in ST mode, this event shall increment by PMMIR_EL1.= SLOTS quantities, and when in SMT partitioned resource mode (regardless of = in WFI state or otherwise), this event is incremented by PMMIR_EL1.SLOTS/2 = quantities." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json = b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json > new file mode 100644 > index 000000000000..ed6f764eff24 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json > @@ -0,0 +1,122 @@ > +[ > + { > + "ArchStdEvent": "L1D_CACHE_REFILL", > + "PublicDescription": "This event counts L1 D-cache refills cause= d by speculatively executed load or store operations, preload instructions,= or hardware cache prefetching that missed in the L1 D-cache. This event on= ly counts one event per cache line.\nSince the caches are Write-back only f= or this processor, there are no Write-through cache accesses." > + }, > + { > + "ArchStdEvent": "L1D_CACHE", > + "PublicDescription": "This event counts L1 D-cache accesses from= any load/store operations, software preload, or hardware prefetch operatio= ns. Atomic operations that resolve in the CPU's caches (near atomic operati= ons) count as both a write access and read access. Each access to a cache l= ine is counted including the multiple accesses caused by single instruction= s such as LDM or STM. Each access to other L1 data or unified memory struct= ures, for example refill buffers, write buffers, and write-back buffers, ar= e also counted.\nThis event counts the sum of the following events:\nL1D_CA= CHE_RD,\nL1D_CACHE_WR,\nL1D_CACHE_PRFM, and\nL1D_CACHE_HWPRF." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_WB", > + "PublicDescription": "This event counts write-backs of dirty dat= a from the L1 D-cache to the L2 cache. This occurs when either a dirty cach= e line is evicted from L1 D-cache and allocated in the L2 cache or dirty da= ta is written to the L2 and possibly to the next level of cache. This event= counts both victim cache line evictions and cache write-backs from snoops = or cache maintenance operations. The following cache operations are not cou= nted:\n* Invalidations which do not result in data being transferred out of= the L1 (such as evictions of clean data),\n* Full line writes which write = to L2 without writing L1, such as write streaming mode.\nThis event is the = sum of the following events:\nL1D_CACHE_WB_CLEAN and\nL1D_CACHE_WB_VICTIM." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_LMISS_RD", > + "PublicDescription": "This event counts cache line refills into = the L1 D-cache from any memory Read operations, that incurred additional la= tency.\nCounts same as L1D_CACHE_REFILL_RD on this CPU." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_RD", > + "PublicDescription": "This event counts L1 D-cache accesses from= any Load operation. Atomic Load operations that resolve in the CPU's cache= s count as both a write access and read access." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_WR", > + "PublicDescription": "This event counts L1 D-cache accesses gene= rated by Store operations. This event also counts accesses caused by a DC Z= VA (D-cache zero, specified by virtual address) instruction. Near atomic op= erations that resolve in the CPU's caches count as a write access and read = access.\nThis event is a subset of the L1D_CACHE event, except this event o= nly counts memory Write operations." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_REFILL_RD", > + "PublicDescription": "This event counts L1 D-cache refills cause= d by speculatively executed Load instructions where the memory Read operati= on misses in the L1 D-cache. This event only counts one event per cache lin= e.\nThis event is a subset of the L1D_CACHE_REFILL event, but only counts m= emory Read operations. This event does not count reads caused by cache main= tenance operations or preload instructions." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_REFILL_WR", > + "PublicDescription": "This event counts L1 D-cache refills cause= d by speculatively executed Store instructions where the memory Write opera= tion misses in the L1 D-cache. This event only counts one event per cache l= ine.\nThis event is a subset of the L1D_CACHE_REFILL event, but only counts= memory Write operations." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_REFILL_INNER", > + "PublicDescription": "This event counts L1 D-cache refills (L1D_= CACHE_REFILL) where the cache line data came from caches inside the immedia= te Cluster of the Core (L2 cache)." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER", > + "PublicDescription": "This event counts L1 D-cache refills (L1D_= CACHE_REFILL) for which the cache line data came from outside the immediate= Cluster of the Core, like an SLC in the system interconnect or DRAM or rem= ote socket." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_WB_VICTIM", > + "PublicDescription": "This event counts dirty cache line evictio= ns from the L1 D-cache caused by a new cache line allocation. This event do= es not count evictions caused by cache maintenance operations.\nThis event = is a subset of the L1D_CACHE_WB event, but only counts write-backs that are= a result of the line being allocated for an access made by the CPU." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_WB_CLEAN", > + "PublicDescription": "This event counts write-backs from the L1 = D-cache that are a result of a coherency operation made by another CPU. Eve= nt counts include cache maintenance operations.\nThis event is a subset of = the L1D_CACHE_WB event." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_INVAL", > + "PublicDescription": "This event counts each explicit invalidati= on of a cache line in the L1 D-cache caused by:\n* Cache Maintenance Operat= ions (CMO) that operate by a virtual address.\n* Broadcast cache coherency = operations from another CPU in the system.\nThis event does not count for t= he following conditions:\n* A cache refill invalidates a cache line.\n* A C= MO which is executed on that CPU and invalidates a cache line specified by = Set/Way.\nNote that CMOs that operate by Set/Way cannot be broadcast from o= ne CPU to another." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_RW", > + "PublicDescription": "This event counts L1 data demand cache acc= esses from any Load or Store operation. Near atomic operations that resolve= in the CPU's caches count as both a write access and read access.\nThis ev= ent is implemented as L1D_CACHE_RD + L1D_CACHE_WR" > + }, > + { > + "ArchStdEvent": "L1D_CACHE_PRFM", > + "PublicDescription": "This event counts L1 D-cache accesses from= software preload or prefetch instructions." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_MISS", > + "PublicDescription": "This event counts each demand access count= ed by L1D_CACHE_RW that misses in the L1 Data or unified cache, causing an = access to outside of the L1 caches of this PE." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_REFILL_PRFM", > + "PublicDescription": "This event counts L1 D-cache refills where= the cache line access was generated by software preload or prefetch instru= ctions." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_HWPRF", > + "PublicDescription": "This event counts L1 D-cache accesses from= any Load/Store operations generated by the hardware prefetcher." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF", > + "PublicDescription": "This event counts each hardware prefetch a= ccess counted by L1D_CACHE_HWPRF that causes a refill of the L1 D-cache fro= m outside of the L1 D-cache." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRFM", > + "PublicDescription": "This event counts each demand access first= hit counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched in r= esponse to a prefetch instruction. That is, the L1D_CACHE_REFILL_PRFM event= was generated when the cache line was fetched into the cache.\nOnly the fi= rst hit by a demand access is counted. After this event is generated for a = cache line, the event is not generated again for the same cache line while = it remains in the cache." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_HIT_RW_FHWPRF", > + "PublicDescription": "This event counts each demand access first= hit counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched by a= hardware prefetcher. That is, the L1D_CACHE_REFILL_HWPRF Event was generat= ed when the cache line was fetched into the cache.\nOnly the first hit by a= demand access is counted. After this event is generated for a cache line, = the event is not generated again for the same cache line while it remains i= n the cache." > + }, > + { > + "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRF", > + "PublicDescription": "This event counts each demand access first= hit counted by L1D_CACHE_HIT_RW where the cache line was fetched in respon= se to a prefetch instruction or by a hardware prefetcher. That is, the L1D_= CACHE_REFILL_PRF event was generated when the cache line was fetched into t= he cache.\nOnly the first hit by a demand access is counted. After this eve= nt is generated for a cache line, the event is not generated again for the = same cache line while it remains in the cache." > + }, > + { > + "ArchStdEvent": "L1D_LFB_HIT_RW_FPRFM", > + "PublicDescription": "This event counts each demand access line-= fill buffer first hit counted by L1D_LFB_HIT_RW_FPRF where the cache line w= as fetched in response to a prefetch instruction. That is, the access hits = a cache line that is in the process of being loaded into the L1 D-cache, an= d so does not generate a new refill, but has to wait for the previous refil= l to complete, and the L1D_CACHE_REFILL_PRFM event was generated when the c= ache line was fetched into the cache.\nOnly the first hit by a demand acces= s is counted. After this event is generated for a cache line, the event is = not generated again for the same cache line while it remains in the cache." > + }, > + { > + "ArchStdEvent": "L1D_LFB_HIT_RW_FHWPRF", > + "PublicDescription": "This event counts each demand access line-= fill buffer first hit counted by L1D_LFB_HIT_RW_FPRF, where the cache line = was fetched by a hardware prefetcher. That is, the access hits a cache line= that is in the process of being loaded into the L1 D-cache, and so does no= t generate a new refill, but has to wait for the previous refill to complet= e, and the L1D_CACHE_REFILL_HWPRF Event was generated when the cache line w= as fetched into the cache.\nOnly the first hit by a demand access is counte= d. After this event is generated for a cache line, the event is not generat= ed again for the same cache line while it remains in the cache." > + }, > + { > + "ArchStdEvent": "L1D_LFB_HIT_RW_FPRF", > + "PublicDescription": "This event counts each demand access line-= fill buffer first hit counted by L1D_LFB_HIT_RW where the cache line was fe= tched in response to a prefetch instruction or by a hardware prefetcher. Th= at is, the access hits a cache line that is in the process of being loaded = into the L1 D-cache, and so does not generate a new refill, but has to wait= for the previous refill to complete, and the L1D_CACHE_REFILL_PRF event wa= s generated when the cache line was fetched into the cache.\nOnly the first= hit by a demand access is counted. After this event is generated for a cac= he line, the event is not generated again for the same cache line while it = remains in the cache." > + }, > + { > + "EventCode": "0x01f5", > + "EventName": "L1D_CACHE_REFILL_RW", > + "PublicDescription": "L1 D-cache refill, demand Read and Write. = This event counts demand Read and Write accesses that causes a refill of th= e L1 D-cache of this PE, from outside of this cache." > + }, > + { > + "EventCode": "0x0204", > + "EventName": "L1D_CACHE_REFILL_OUTER_LLC", > + "PublicDescription": "This event counts L1D_CACHE_REFILL from L3= D-cache." > + }, > + { > + "EventCode": "0x0205", > + "EventName": "L1D_CACHE_REFILL_OUTER_DRAM", > + "PublicDescription": "This event counts L1D_CACHE_REFILL from lo= cal memory." > + }, > + { > + "EventCode": "0x0206", > + "EventName": "L1D_CACHE_REFILL_OUTER_REMOTE", > + "PublicDescription": "This event counts L1D_CACHE_REFILL from a = remote memory." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json = b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json > new file mode 100644 > index 000000000000..952454004d98 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json > @@ -0,0 +1,114 @@ > +[ > + { > + "ArchStdEvent": "L1I_CACHE_REFILL", > + "PublicDescription": "This event counts cache line refills in th= e L1 I-cache caused by a missed instruction fetch (demand, hardware prefetc= h, and software preload accesses). Instruction fetches may include accessin= g multiple instructions, but the single cache line allocation is counted on= ce." > + }, > + { > + "ArchStdEvent": "L1I_CACHE", > + "PublicDescription": "This event counts instruction fetches (dem= and, hardware prefetch, and software preload accesses) which access the L1 = Instruction Cache. Instruction Cache accesses caused by cache maintenance o= perations are not counted." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_LMISS", > + "PublicDescription": "This event counts cache line refills into = the L1 I-cache, that incurred additional latency.\nCounts the same as L1I_C= ACHE_REFILL in this CPU." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_RD", > + "PublicDescription": "This event counts demand instruction fetch= es which access the L1 I-cache." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_PRFM", > + "PublicDescription": "This event counts instruction fetches gene= rated by software preload or prefetch instructions which access the L1 I-ca= che." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_HWPRF", > + "PublicDescription": "This event counts instruction fetches whic= h access the L1 I-cache generated by the hardware prefetcher." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_REFILL_PRFM", > + "PublicDescription": "This event counts cache line refills in th= e L1 I-cache caused by a missed instruction fetch generated by software pre= load or prefetch instructions. Instruction fetches may include accessing mu= ltiple instructions, but the single cache line allocation is counted once." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_REFILL_HWPRF", > + "PublicDescription": "This event counts each hardware prefetch a= ccess counted by L1I_CACHE_HWPRF that causes a refill of the Level 1I-cache= from outside of the L1 I-cache." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_HIT_RD", > + "PublicDescription": "This event counts demand instruction fetch= es that access the L1 I-cache and hit in the L1 I-cache." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_HIT_RD_FPRF", > + "PublicDescription": "This event counts each demand fetch first = hit counted by L1I_CACHE_HIT_RD where the cache line was fetched in respons= e to a software preload or by a hardware prefetcher. That is, the L1I_CACHE= _REFILL_PRF event was generated when the cache line was fetched into the ca= che.\nOnly the first hit by a demand access is counted. After this event is= generated for a cache line, the event is not generated again for the same = cache line while it remains in the cache." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_HIT", > + "PublicDescription": "This event counts instruction fetches that= access the L1 I-cache (demand, hardware prefetch, and software preload acc= esses) and hit in the L1 I-cache. I-cache accesses caused by cache maintena= nce operations are not counted." > + }, > + { > + "ArchStdEvent": "L1I_CACHE_HIT_PRFM", > + "PublicDescription": "This event counts instruction fetches gene= rated by software preload or prefetch instructions that access the L1 I-cac= he and hit in the L1 I-cache." > + }, > + { > + "ArchStdEvent": "L1I_LFB_HIT_RD", > + "PublicDescription": "This event counts demand instruction fetch= es that access the L1 I-cache and hit in a line that is in the process of b= eing loaded into the L1 I-cache." > + }, > + { > + "EventCode": "0x0174", > + "EventName": "L1I_HWPRF_REQ_DROP", > + "PublicDescription": "L1 I-cache hardware prefetch dropped." > + }, > + { > + "EventCode": "0x01e3", > + "EventName": "L1I_CACHE_REFILL_RD", > + "PublicDescription": "L1 I-cache refill, Read.\nThis event count= s demand instruction fetch that causes a refill of the L1 I-cache of this P= E, from outside of this cache." > + }, > + { > + "EventCode": "0x01ea", > + "EventName": "L1I_CFC_ENTRIES", > + "PublicDescription": "This event counts the CFC (Cache Fill Cont= rol) entries.\nThe CFC is the fill buffer for I-cache." > + }, > + { > + "EventCode": "0x01ef", > + "EventName": "L1I_CACHE_INVAL", > + "PublicDescription": "L1 I-cache invalidate.\nThis event counts = each explicit invalidation of a cache line in the L1 I-cache caused by:\n* = Broadcast cache coherency operations from another CPU in the system.\n* Inv= alidation dues to capacity eviction in L2 D-cache.\nThis event does not cou= nt for the following conditions:\n* A cache refill invalidates a cache line= =2E\n* A CMO which is executed on that CPU Core and invalidates a cache lin= e specified by Set/Way.\n* Cache Maintenance Operations (CMO) that operate = by a virtual address.\nNote that\n* CMOs that operate by Set/Way cannot be = broadcast from one CPU Core to another.\n* The CMO is treated as No-op for = the purposes of L1 I-cache line invalidation, as this Core implements fully= coherent I-cache." > + }, > + { > + "EventCode": "0x0212", > + "EventName": "L1I_CACHE_HIT_HWPRF", > + "PublicDescription": "This event counts each hardware prefetch a= ccess that hits an L1 I-cache." > + }, > + { > + "EventCode": "0x0215", > + "EventName": "L1I_LFB_HIT", > + "PublicDescription": "L1 Line fill buffer hit.\nThis event count= s each Demand or software preload or hardware prefetch induced instruction = fetch that hits an L1 I-cache line that is in the process of being loaded i= nto the L1 instruction cache, and so does not generate a new refill, but ha= s to wait for the previous refill to complete." > + }, > + { > + "EventCode": "0x0216", > + "EventName": "L1I_LFB_HIT_PRFM", > + "PublicDescription": "This event counts each software prefetch a= ccess that hits a cache line that is in the process of being loaded into th= e L1 instruction cache, and so does not generate a new refill, but has to w= ait for the previous refill to complete." > + }, > + { > + "EventCode": "0x0219", > + "EventName": "L1I_LFB_HIT_HWPRF", > + "PublicDescription": "This event counts each hardware prefetch a= ccess that hits a cache line that is in the process of being loaded into th= e L1 instruction cache, and so does not generate a new refill, but has to w= ait for the previous refill to complete." > + }, > + { > + "EventCode": "0x0221", > + "EventName": "L1I_PRFM_REQ", > + "PublicDescription": "L1 I-cache software prefetch requests." > + }, > + { > + "EventCode": "0x0222", > + "EventName": "L1I_HWPRF_REQ", > + "PublicDescription": "L1 I-cache hardware prefetch requests." > + }, > + { > + "EventCode": "0x0228", > + "EventName": "L1I_CACHE_HIT_PRFM_FPRF", > + "PublicDescription": "L1 I-cache software prefetch access first = hit, fetched by hardware or software prefetch.\nThis event counts each soft= ware preload access first hit where the cache line was fetched in response = to a hardware prefetcher or software preload instruction.\nOnly the first h= it is counted. After this event is generated for a cache line, the event is= not generated again for the same cache line while it remains in the cache." > + }, > + { > + "EventCode": "0x022a", > + "EventName": "L1I_CACHE_HIT_HWPRF_FPRF", > + "PublicDescription": "L1 I-cache hardware prefetch access first = hit, fetched by hardware or software prefetch.\nThis event counts each hard= ware prefetch access first hit where the cache line was fetched in response= to a hardware or prefetch instruction.\nOnly the first hit is counted. Aft= er this event is generated for a cache line, the event is not generated aga= in for the same cache line while it remains in the cache." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json = b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json > new file mode 100644 > index 000000000000..66f21a94381e > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json > @@ -0,0 +1,134 @@ > +[ > + { > + "ArchStdEvent": "L2D_CACHE", > + "PublicDescription": "This event counts accesses to the L2 cache= due to data accesses. L2 cache is a unified cache for data and instruction= accesses. Accesses are for misses in the L1 D-cache or translation resolut= ions due to accesses. This event also counts write-back of dirty data from = L1 D-cache to the L2 cache.\nI-cache accesses are included in this event. T= his event is the sum of the following events:\nL2D_CACHE_RD,\nL2D_CACHE_WR,= \nL2D_CACHE_PRFM, and\nL2D_CACHE_HWPRF." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_REFILL", > + "PublicDescription": "This event counts cache line refills into = the L2 cache. L2 cache is a unified cache for data and instruction accesses= =2E Accesses are for misses in the L1 D-cache or translation resolutions du= e to accesses.\nI-cache refills are included in this event. This event is t= he sum of the following events:\nL2D_CACHE_REFILL_RD,\nL2D_CACHE_REFILL_WR,= \nL2D_CACHE_REFILL_HWPRF, and\nL2D_CACHE_REFILL_PRFM." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_WB", > + "PublicDescription": "This event counts write-backs of data from= the L2 cache to outside the CPU. This includes snoops to the L2 (from othe= r CPUs) which return data even if the snoops cause an invalidation. L2 cach= e line invalidations which do not write data outside the CPU and snoops whi= ch return data from an L1 cache are not counted. Data would not be written = outside the cache when invalidating a clean cache line.\nThis event is the = sum of the following events:\nL2D_CACHE_WB_VICTIM and\nL2D_CACHE_WB_CLEAN." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_RD", > + "PublicDescription": "This event counts L2 D-cache accesses due = to memory Read operations. L2 cache is a unified cache for data and instruc= tion accesses, accesses are for misses in the L1 D-cache or translation res= olutions due to accesses.\nI-cache accesses are included in this event. Thi= s event is a subset of the L2D_CACHE event, but this event only counts memo= ry Read operations." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_WR", > + "PublicDescription": "This event counts L2 cache accesses due to= memory Write operations. L2 cache is a unified cache for data and instruct= ion accesses, accesses are for misses in the L1 D-cache or translation reso= lutions due to accesses.\nThis event is a subset of the L2D_CACHE event, bu= t this event only counts memory Write operations." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_REFILL_RD", > + "PublicDescription": "This event counts refills for memory acces= ses due to memory Read operation counted by L2D_CACHE_RD. L2 cache is a uni= fied cache for data and instruction accesses, accesses are for misses in th= e L1 D-cache or translation resolutions due to accesses.\nThis CPU includes= I-cache refills in this counter as an L2I equivalent event was not impleme= nted. This event is a subset of the L2D_CACHE_REFILL event. This event does= not count L2 refills caused by stashes into L2.\nThis count includes deman= d requests that encounter an L2 prefetch request or an L2 software prefetch= request to the same cache line, which is still pending in the L2 LFB." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_REFILL_WR", > + "PublicDescription": "This event counts refills for memory acces= ses due to memory Write operation counted by L2D_CACHE_WR. L2 cache is a un= ified cache for data and instruction accesses, accesses are for misses in t= he L1 D-cache or translation resolutions due to accesses.\nThis count inclu= des demand requests that encounter an L2 prefetch request or an L2 software= prefetch request to the same cache line, which is still pending in the L2 = LFB." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_WB_VICTIM", > + "PublicDescription": "This event counts evictions from the L2 ca= che because of a line being allocated into the L2 cache.\nThis event is a s= ubset of the L2D_CACHE_WB event." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_WB_CLEAN", > + "PublicDescription": "This event counts write-backs from the L2 = cache that are a result of any of the following:\n* Cache maintenance opera= tions,\n* Snoop responses, or\n* Direct cache transfers to another CPU due = to a forwarding snoop request.\nThis event is a subset of the L2D_CACHE_WB = event." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_INVAL", > + "PublicDescription": "This event counts each explicit invalidati= on of a cache line in the L2 cache by cache maintenance operations that ope= rate by a virtual address, or by external coherency operations. This event = does not count if either:\n* A cache refill invalidates a cache line, or\n*= A cache Maintenance Operation (CMO), which invalidates a cache line specif= ied by Set/Way,\nis executed on that CPU.\nCMOs that operate by Set/Way can= not be broadcast from one CPU to another." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_LMISS_RD", > + "PublicDescription": "This event counts cache line refills into = the L2 unified cache from any memory Read operations that incurred addition= al latency.\nCounts the same as L2D_CACHE_REFILL_RD in this CPU" > + }, > + { > + "ArchStdEvent": "L2D_CACHE_RW", > + "PublicDescription": "This event counts L2 cache demand accesses= from any Load/Store operations. L2 cache is a unified cache for data and i= nstruction accesses, accesses are for misses in the L1 D-cache or translati= on resolutions due to accesses.\nI-cache accesses are included in this even= t.\nThis event is the sum of the following events:\nL2D_CACHE_RD and\nL2D_C= ACHE_WR." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_PRFM", > + "PublicDescription": "This event counts L2 D-cache accesses gene= rated by software preload or prefetch instructions with target =3D L1/L2/L3= cache.\nNote that a software preload or prefetch instructions with (target= =3D L1/L2/L3) that hits in L1D will not result in an L2 D-cache access. Th= erefore, such a software preload or prefetch instructions will not be count= ed by this event." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_MISS", > + "PublicDescription": "This event counts cache line misses in the= L2 cache. L2 cache is a unified cache for data and instruction accesses. A= ccesses are for misses in the L1 D-cache or translation resolutions due to = accesses.\nThis event counts the same as L2D_CACHE_REFILL_RD in this CPU." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_REFILL_PRFM", > + "PublicDescription": "This event counts refills due to accesses = generated as a result of software preload or prefetch instructions as count= ed by L2D_CACHE_PRFM. I-cache refills are included in this event." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_HWPRF", > + "PublicDescription": "This event counts the L2 D-cache access ca= used by L1 or L2 hardware prefetcher." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF", > + "PublicDescription": "This event counts each hardware prefetch a= ccess counted by L2D_CACHE_HWPRF that causes a refill of the L2 cache, or a= ny L1 Data, or Instruction cache of this PE, from outside of those caches.\= nThis does not include prefetch requests pending waiting for a refill in LF= B and a new demand request to the same cache line hitting the LFB entry. Al= l such refills are counted as L2D_LFB_HIT_RWL1PRF_FHWPRF." > + }, > + { > + "ArchStdEvent": "L2D_CACHE_REFILL_PRF", > + "PublicDescription": "This event counts each access to L2 Cache = due to a prefetch instruction, or hardware prefetch that causes a refill of= the L2 or any Level 1, from outside of those caches." > + }, > + { > + "EventCode": "0x0108", > + "EventName": "L2D_CACHE_IF_REFILL", > + "PublicDescription": "L2 D-cache refill, instruction fetch.\nThi= s event counts demand instruction fetch that causes a refill of the L2 cach= e or L1 cache of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x0109", > + "EventName": "L2D_CACHE_TBW_REFILL", > + "PublicDescription": "L2 D-cache refill, Page table walk.\nThis = event counts demand translation table walk that causes a refill of the L2 c= ache or L1 cache of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x010a", > + "EventName": "L2D_CACHE_PF_REFILL", > + "PublicDescription": "L2 D-cache refill, prefetch.\nThis event c= ounts L1 or L2 hardware or software prefetch accesses that causes a refill = of the L2 cache or L1 cache of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x010b", > + "EventName": "L2D_LFB_HIT_RWL1PRF_FHWPRF", > + "PublicDescription": "L2 line fill buffer demand Read, demand Wr= ite or L1 prefetch first hit, fetched by hardware prefetch.\nThis event cou= nts each of the following access that hit the line-fill buffer when the sam= e cache line is already being fetched due to an L2 hardware prefetcher.\n* = Demand Read or Write\n* L1I-HWPRF\n* L1D-HWPRF\n* L1I PRFM\n* L1D PRFM\nThe= se accesses hit a cache line that is currently being loaded into the L2 cac= he as a result of a hardware prefetcher to the same line. Consequently, thi= s access does not initiate a new refill but waits for the completion of the= previous refill.\nOnly the first hit is counted. After this event is gener= ated for a cache line, the event is not generated again for the same cache = line while it remains in the cache." > + }, > + { > + "EventCode": "0x0179", > + "EventName": "L2D_CACHE_HIT_RWL1PRF_FHWPRF", > + "PublicDescription": "L2 D-cache demand Read, demand Write and L= 1 prefetch hit, fetched by hardware prefetch. This event counts each demand= Read, demand Write and L1 hardware or software prefetch request that hit a= n L2 D-cache line that was refilled into L2 D-cache in response to an L2 ha= rdware prefetch. Only the first hit is counted. After this event is generat= ed for a cache line, the event is not generated again for the same cache li= ne while it remains in the cache." > + }, > + { > + "EventCode": "0x01b8", > + "EventName": "L2D_CACHE_L1PRF", > + "PublicDescription": "L2 D-cache access, L1 hardware or software= prefetch. This event counts L1 Hardware or software prefetch access to L2 = D-cache." > + }, > + { > + "EventCode": "0x01b9", > + "EventName": "L2D_CACHE_REFILL_L1PRF", > + "PublicDescription": "L2 D-cache refill, L1 hardware or software= prefetch.\nThis event counts each access counted by L2D_CACHE_L1PRF that c= auses a refill of the L2 cache or any L1 cache of this PE, from outside of = those caches." > + }, > + { > + "EventCode": "0x0201", > + "EventName": "L2D_CACHE_BACKSNOOP_L1D_VIRT_ALIASING", > + "PublicDescription": "This event counts when the L2 D-cache send= s an invalidating back-snoop to the L1 D for an access initiated by the L1 = D, where the corresponding line is already present in the L1 D-cache.\nThe = L2 D-cache line tags the PE that refilled the line. It also retains specifi= c bits of the VA to identify virtually aliased addresses.\nThe L1 D request= requiring a back-snoop can originate either from the same PE that refilled= the L2 D line or from a different PE. In either case, this event only coun= ts those back snoop where the requested VA mismatch the VA stored in the L2= D tag.\nThis event is counted only by PE that initiated the original reque= st necessitating a back-snoop.\nNote : The L1 D is VIPT, it identifies this= access as a miss. Conversely, as L2 is PIPT, it identifies this as a hit. = L2 D utilizes the back-snoop mechanism to refill L1 D with the snooped data= =2E" > + }, > + { > + "EventCode": "0x0208", > + "EventName": "L2D_CACHE_RWL1PRF", > + "PublicDescription": "L2 D-cache access, demand Read, demand Wri= te or L1 hardware or software prefetch.\nThis event counts each access to L= 2 D-cache due to the following:\n* Demand Read or Write.\n* L1 Hardware or = software prefetch." > + }, > + { > + "EventCode": "0x020a", > + "EventName": "L2D_CACHE_REFILL_RWL1PRF", > + "PublicDescription": "L2 D-cache refill, demand Read, demand Wri= te or L1 hardware or software prefetch.\nThis event counts each access coun= ted by L2D_CACHE_RWL1PRF that causes a refill of the L2 cache, or any L1 ca= che of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x020c", > + "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRFM", > + "PublicDescription": "L2 D-cache demand Read, demand Write and L= 1 prefetch hit, fetched by software prefetch.\nThis event counts each deman= d Read, demand Write and L1 hardware or software prefetch request that hit = an L2 D-cache line that was refilled into L2 D-cache in response to an L2 s= oftware prefetch. Only the first hit is counted. After this event is genera= ted for a cache line, the event is not generated again for the same cache l= ine while it remains in the cache." > + }, > + { > + "EventCode": "0x020e", > + "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRF", > + "PublicDescription": "L2 D-cache demand Read, demand Write and L= 1 prefetch hit, fetched by software or hardware prefetch.\nThis event count= s each demand Read, demand Write and L1 hardware or software prefetch reque= st that hit an L2 D-cache line that was refilled into L2 D-cache in respons= e to an L2 hardware prefetch or software prefetch. Only the first hit is co= unted. After this event is generated for a cache line, the event is not gen= erated again for the same cache line while it remains in the cache." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json b= /tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json > new file mode 100644 > index 000000000000..851d0a70de9c > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json > @@ -0,0 +1,107 @@ > +[ > + { > + "ArchStdEvent": "L3D_CACHE_ALLOCATE", > + "PublicDescription": "This event counts each memory Write operat= ion that writes an entire line into the L3 data without fetching data from = outside the L3 Data. These are allocations of cache lines in the L3 Data th= at are not refills counted by\nL3D_CACHE_REFILL. For example:\nA Write-back= of an entire cache line from an L2 cache to the L3 D-cache.\n* A Write of = an entire cache line from a coalescing Write buffer.\n* An operation such a= s DC ZVA.\nThis counter does not count writes that write an entire line to = beyond level 3. Thus this counter does not count the streaming writes to be= yond L3 cache." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_REFILL", > + "PublicDescription": "This event counts each access counted by L= 3D_CACHE that causes a refill of the L3 Data, or any L1 Data, instruction o= r L2 cache of this PE, from outside of those caches. This includes the refi= ll due to hardware prefetch and software prefetch accesses.\nThis event is = a sum of L3D_CACHE_MISS, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL_HWPRF e= vent.\nA refill includes any access that causes data to be fetched from out= side of the L1 to L3 caches, even if the data is ultimately not allocated i= nto the L3 D-cache." > + }, > + { > + "ArchStdEvent": "L3D_CACHE", > + "PublicDescription": "This event counts each memory Read operati= on or memory Write operation that causes a cache access to the Level 3.\nTh= is event is a sum of the following Events:\n* L3D_CACHE_RD(0x00a0)\n* L3D_C= ACHE_ALLOCATE(0x0029)\n* L3D_CACHE_PRFM(0x8151)\n* L3D_CACHE_HWPRF(0x8156)\= n* L2D_CACHE_WB(0x0018)" > + }, > + { > + "ArchStdEvent": "LL_CACHE_RD", > + "PublicDescription": "This is an alias to the event L3D_CACHE_RD= (0x00a0)." > + }, > + { > + "ArchStdEvent": "LL_CACHE_MISS_RD", > + "PublicDescription": "This is an alias to the event L3D_CACHE_RE= FILL_RD (0x00a2)." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_RD", > + "PublicDescription": "This event counts each Memory Read operati= on to L3 D-cache from instruction fetch, Load/Store, and MMU translation ta= ble accesses. This does not include hardware prefetcher or PRFM instruction= accesses. This include L1 and L2 prefetcher accesses to L3 D-cache." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_REFILL_RD", > + "PublicDescription": "This event counts each access counted by b= oth L3D_CACHE_RD and L3D_CACHE_REFILL. That is, every refill of the L3 cach= e counted by L3D_CACHE_REFILL that is caused by a Memory Read operation.\nT= he L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS= _RD(0x400b) count the same event in the hardware." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_LMISS_RD", > + "PublicDescription": "This event counts each memory Read operati= on to the L3 cache counted by L3D_CACHE that incurs additional latency beca= use it returns data from outside of the L1 to L3 caches.\nThe L3D_CACHE_MIS= S(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) coun= t the same event in the hardware." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_RW", > + "PublicDescription": "This event counts each access counted by L= 3D_CACHE that is due to a demand memory Read operation or demand memory Wri= te operation.\nThis event is a sum of L3D_CACHE_RD(0x00a0), L3D_CACHE_ALLOC= ATE(0x0029) and L2D_CACHE_WB(0x0018).\nNote that this counter does not coun= t that writes an entire line to beyond level 3. Thus this counter does not = count the streaming Writes to beyond L3 cache." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_PRFM", > + "PublicDescription": "This event counts each access counted by L= 3D_CACHE that is due to a prefetch instruction. This includes L3 Data acces= ses due to the L1, L2, or L3 prefetch instruction." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_MISS", > + "PublicDescription": "This event counts each demand Read access = counted by L3D_CACHE_RD that misses in the L1 to L3 Data, causing an access= to outside of the L3 cache.\nThe L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_= RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count the same event in the hard= ware." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_REFILL_PRFM", > + "PublicDescription": "This event counts each access counted by L= 3D_CACHE_PRFM that causes a refill of the L3 cache, or any L1 or L2 Data, f= rom outside of those caches." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_HWPRF", > + "PublicDescription": "This event counts each access to L3 cache = that is due to a hardware prefetcher. This includes L3D accesses due to the= Level-1 or Level-2 or Level-3 hardware prefetcher." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_REFILL_HWPRF", > + "PublicDescription": "This event counts each hardware prefetch c= ounted by L3D_CACHE_HWPRF that causes a refill of the L3 Data or unified ca= che, or any L1 or L2 Data, Instruction, or unified cache of this PE, from o= utside of those caches." > + }, > + { > + "ArchStdEvent": "L3D_CACHE_REFILL_PRF", > + "PublicDescription": "This event counts each access to L3 cache = due to a prefetch instruction, or hardware prefetch that causes a refill of= the L3 Data, or any L1 or L2 Data, from outside of those caches." > + }, > + { > + "EventCode": "0x01e8", > + "EventName": "L3D_CACHE_RWL1PRFL2PRF", > + "PublicDescription": "L3 cache access, demand Read, demand Write= , L1 hardware or software prefetch or L2 hardware or software prefetch.\nTh= is event counts each access to L3 D-cache due to the following:\n* Demand R= ead or Write.\n* L1 Hardware or software prefetch.\n* L2 Hardware or softwa= re prefetch." > + }, > + { > + "EventCode": "0x01e9", > + "EventName": "L3D_CACHE_REFILL_RWL1PRFL2PRF", > + "PublicDescription": "L3 cache refill, demand Read, demand Write= , L1 hardware or software prefetch or L2 hardware or software prefetch.\nTh= is event counts each access counted by L3D_CACHE_RWL1PRFL2PRF that causes a= refill of the L3 cache, or any L1 or L2 cache of this PE, from outside of = those caches." > + }, > + { > + "EventCode": "0x01f6", > + "EventName": "L3D_CACHE_REFILL_L2PRF", > + "PublicDescription": "This event counts each access counted by L= 3D_CACHE_L2PRF that causes a refill of the L3 cache, or any L1 or L2 cache = of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x01f7", > + "EventName": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF", > + "PublicDescription": "L3 cache demand Read, demand Write, L1 pre= fetch L2 prefetch first hit, fetched by software or hardware prefetch.\nThi= s event counts each demand Read, demand Write, L1 hardware or software pref= etch request and L2 hardware or software prefetch that hit an L3 D-cache li= ne that was refilled into L3 D-cache in response to an L3 hardware prefetch= or software prefetch. Only the first hit is counted. After this event is g= enerated for a cache line, the event is not generated again for the same ca= che line while it remains in the cache." > + }, > + { > + "EventCode": "0x0225", > + "EventName": "L3D_CACHE_REFILL_IF", > + "PublicDescription": "L3 cache refill, instruction fetch.\nThis = event counts demand instruction fetch that causes a refill of the L3 cache,= or any L1 or L2 cache of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x0226", > + "EventName": "L3D_CACHE_REFILL_MM", > + "PublicDescription": "L3 cache refill, translation table walk ac= cess.\nThis event counts demand translation table access that causes a refi= ll of the L3 cache, or any L1 or L2 cache of this PE, from outside of those= caches." > + }, > + { > + "EventCode": "0x0227", > + "EventName": "L3D_CACHE_REFILL_L1PRF", > + "PublicDescription": "This event counts each access counted by L= 3D_CACHE_L1PRF that causes a refill of the L3 cache, or any L1 or L2 cache = of this PE, from outside of those caches." > + }, > + { > + "EventCode": "0x022c", > + "EventName": "L3D_CACHE_L1PRF", > + "PublicDescription": "This event counts the L3 D-cache access du= e to L1 hardware prefetch or software prefetch request.\nThe L1 hardware pr= efetch or software prefetch requests that miss the L1I, L1D and L2 D-cache = are counted by this counter" > + }, > + { > + "EventCode": "0x022d", > + "EventName": "L3D_CACHE_L2PRF", > + "PublicDescription": "This event counts the L3 D-cache access du= e to L2 hardware prefetch or software prefetch request.\nThe L2 hardware pr= efetch or software prefetch requests that miss the L2 D-cache are counted b= y this counter" > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json b/t= ools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json > new file mode 100644 > index 000000000000..becd2d90bf39 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json > @@ -0,0 +1,46 @@ > +[ > + { > + "ArchStdEvent": "MEM_ACCESS", > + "PublicDescription": "This event counts memory accesses issued b= y the CPU load/store unit, where those accesses are issued due to load or s= tore operations. This event counts memory accesses regardless of whether th= e data is received from any level of cache hierarchy or external memory. If= memory accesses are broken up into smaller transactions than what were spe= cified in the load or store instructions, then the event counts those small= er memory transactions.\nMemory accesses generated by the following instruc= tions or activity are not counted: instruction fetches, cache maintenance i= nstructions, translation table walks or prefetches, memory prefetch operati= ons. This event counts the sum of the following events:\nMEM_ACCESS_RD and\= nMEM_ACCESS_WR." > + }, > + { > + "ArchStdEvent": "MEMORY_ERROR", > + "PublicDescription": "This event counts any detected correctable= or uncorrectable physical memory errors (ECC or parity) in protected CPU R= AMs. On the Core, this event counts errors in the caches (including data an= d tag RAMs). Any detected memory error (from either a speculative and aband= oned access, or an architecturally executed access) is counted.\nNote that = errors are only detected when the actual protected memory is accessed by an= operation." > + }, > + { > + "ArchStdEvent": "REMOTE_ACCESS", > + "PublicDescription": "This event counts each external bus read a= ccess that causes an access to a remote device. That is, a socket that does= not contain the PE." > + }, > + { > + "ArchStdEvent": "MEM_ACCESS_RD", > + "PublicDescription": "This event counts memory accesses issued b= y the CPU due to Load operations. This event counts any memory Load access,= no matter whether the data is received from any level of cache hierarchy o= r external memory. This event also counts atomic Load operations. If memory= accesses are broken up by the Load/Store unit into smaller transactions th= at are issued by the bus interface, then the event counts those smaller tra= nsactions.\nThe following instructions are not counted:\n1) Instruction fet= ches,\n2) Cache maintenance instructions,\n3) Translation table walks or pr= efetches,\n4) Memory prefetch operations.\nThis event is a subset of the ME= M_ACCESS event but the event only counts memory-Read operations." > + }, > + { > + "ArchStdEvent": "MEM_ACCESS_WR", > + "PublicDescription": "This event counts memory accesses issued b= y the CPU due to Store operations. This event counts any memory Store acces= s, no matter whether the data is located in any level of cache or external = memory. This event also counts atomic Load and Store operations. If memory = accesses are broken up by the Load/Store unit into smaller transactions tha= t are issued by the bus interface, then the event counts those smaller tran= sactions." > + }, > + { > + "ArchStdEvent": "LDST_ALIGN_LAT", > + "PublicDescription": "This event counts the number of memory Rea= d and Write accesses in a cycle that incurred additional latency due to the= alignment of the address and the size of data being accessed, which result= s in a store crossing a single cache line.\nThis event is implemented as th= e sum of the following events on this CPU:\nLD_ALIGN_LAT and\nST_ALIGN_LAT." > + }, > + { > + "ArchStdEvent": "LD_ALIGN_LAT", > + "PublicDescription": "This event counts the number of memory Rea= d accesses in a cycle that incurred additional latency due to the alignment= of the address and size of data being accessed, which results in a load cr= ossing a single cache line." > + }, > + { > + "ArchStdEvent": "ST_ALIGN_LAT", > + "PublicDescription": "This event counts the number of memory Wri= te accesses in a cycle that incurred additional latency due to the alignmen= t of the address and size of data being accessed." > + }, > + { > + "ArchStdEvent": "INST_FETCH_PERCYC", > + "PublicDescription": "This event counts number of instruction fe= tches outstanding per cycle, which will provide an average latency of instr= uction fetch." > + }, > + { > + "ArchStdEvent": "MEM_ACCESS_RD_PERCYC", > + "PublicDescription": "This event counts the number of outstandin= g Loads or memory Read accesses per cycle." > + }, > + { > + "ArchStdEvent": "INST_FETCH", > + "PublicDescription": "This event counts instruction memory acces= ses that the PE makes." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json > new file mode 100644 > index 000000000000..b825ede03f54 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json > @@ -0,0 +1,722 @@ > +[ > + { > + "MetricName": "backend_bound", > + "MetricExpr": "100 * (STALL_SLOT_BACKEND / CPU_SLOT)", > + "BriefDescription": "This metric is the percentage of total slot= s that were stalled due to resource constraints in the backend of the proce= ssor.", > + "ScaleUnit": "1percent of slots", > + "MetricGroup": "TopdownL1" > + }, > + { > + "MetricName": "backend_busy_bound", > + "MetricExpr": "100 * (STALL_BACKEND_BUSY / STALL_BACKEND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to issue queues being full to accept operatio= ns for execution.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_cache_l1d_bound", > + "MetricExpr": "100 * (STALL_BACKEND_L1D / (STALL_BACKEND_L1D + S= TALL_BACKEND_MEM))", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to memory access latency issues caused by L1 = D-cache misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_cache_l2d_bound", > + "MetricExpr": "100 * (STALL_BACKEND_MEM / (STALL_BACKEND_L1D + S= TALL_BACKEND_MEM))", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to memory access latency issues caused by L2 = D-cache misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_core_bound", > + "MetricExpr": "100 * (STALL_BACKEND_CPUBOUND / STALL_BACKEND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to backend Core resource constraints not rela= ted to instruction fetch latency issues caused by memory access components.= ", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_core_rename_bound", > + "MetricExpr": "100 * (STALL_BACKEND_RENAME / STALL_BACKEND_CPUBO= UND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend as the rename unit registers are unavailable.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_mem_bound", > + "MetricExpr": "100 * (STALL_BACKEND_MEMBOUND / STALL_BACKEND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to backend Core resource constraints related = to memory access latency issues caused by memory access components.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_mem_cache_bound", > + "MetricExpr": "100 * ((STALL_BACKEND_L1D + STALL_BACKEND_MEM) / = STALL_BACKEND_MEMBOUND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to memory latency issues caused by D-cache mi= sses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_mem_store_bound", > + "MetricExpr": "100 * (STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND)= ", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to memory Write pending caused by Stores stal= led in the pre-commit stage.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_mem_tlb_bound", > + "MetricExpr": "100 * (STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND= )", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the backend due to memory access latency issues caused by Dat= a TLB misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Backend" > + }, > + { > + "MetricName": "backend_stalled_cycles", > + "MetricExpr": "100 * (STALL_BACKEND / CPU_CYCLES)", > + "BriefDescription": "This metric is the percentage of cycles tha= t were stalled due to resource constraints in the backend unit of the proce= ssor.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Cycle_Accounting" > + }, > + { > + "MetricName": "bad_speculation", > + "MetricExpr": "100 - (frontend_bound + retiring + backend_bound)= ", > + "BriefDescription": "This metric is the percentage of total slot= s that executed operations and didn't retire due to a pipeline flush. This = indicates cycles that were utilized but inefficiently.", > + "ScaleUnit": "1percent of slots", > + "MetricGroup": "TopdownL1" > + }, > + { > + "MetricName": "barrier_percentage", > + "MetricExpr": "100 * ((ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SP= EC)", > + "BriefDescription": "This metric measures instruction and data b= arrier operations as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "branch_direct_ratio", > + "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED", > + "BriefDescription": "This metric measures the ratio of direct br= anches retired to the total number of branches architecturally executed.", > + "ScaleUnit": "1per branch", > + "MetricGroup": "Branch_Effectiveness" > + }, > + { > + "MetricName": "branch_indirect_ratio", > + "MetricExpr": "BR_IND_RETIRED / BR_RETIRED", > + "BriefDescription": "This metric measures the ratio of indirect = branches retired, including function returns, to the total number of branch= es architecturally executed.", > + "ScaleUnit": "1per branch", > + "MetricGroup": "Branch_Effectiveness" > + }, > + { > + "MetricName": "branch_misprediction_ratio", > + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED", > + "BriefDescription": "This metric measures the ratio of branches = mispredicted to the total number of branches architecturally executed. This= gives an indication of the effectiveness of the branch prediction unit.", > + "ScaleUnit": "1per branch", > + "MetricGroup": "Miss_Ratio;Branch_Effectiveness" > + }, > + { > + "MetricName": "branch_mpki", > + "MetricExpr": "1000 * (BR_MIS_PRED_RETIRED / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of branch m= ispredictions per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;Branch_Effectiveness" > + }, > + { > + "MetricName": "branch_percentage", > + "MetricExpr": "100 * ((BR_IMMED_SPEC + BR_INDIRECT_SPEC) / INST_= SPEC)", > + "BriefDescription": "This metric measures branch operations as a= percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "branch_return_ratio", > + "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED", > + "BriefDescription": "This metric measures the ratio of branches = retired that are function returns to the total number of branches architect= urally executed.", > + "ScaleUnit": "1per branch", > + "MetricGroup": "Branch_Effectiveness" > + }, > + { > + "MetricName": "bus_bandwidth", > + "MetricExpr": "BUS_ACCESS * 32 / duration_time ", > + "BriefDescription": "This metric measures the bus-bandwidth of t= he data transferred between this PE's L2 with unCore in the system.", > + "ScaleUnit": "1Bytes/sec" > + }, > + { > + "MetricName": "cpu_cycles_fraction_in_st_mode", > + "MetricExpr": "((CPU_SLOT/CPU_CYCLES) - 5) / 5", > + "BriefDescription": "This metric counts fraction of the CPU cycl= es spent in ST mode during program execution.", > + "ScaleUnit": "1fraction of cycles", > + "MetricGroup": "SMT" > + }, > + { > + "MetricName": "cpu_cycles_in_smt_mode", > + "MetricExpr": "(1 - cpu_cycles_fraction_in_st_mode) * CPU_CYCLES= ", > + "BriefDescription": "This metric counts CPU cycles in SMT mode d= uring program execution.", > + "ScaleUnit": "1CPU cycles", > + "MetricGroup": "SMT" > + }, > + { > + "MetricName": "cpu_cycles_in_st_mode", > + "MetricExpr": "cpu_cycles_fraction_in_st_mode * CPU_CYCLES", > + "BriefDescription": "This metric counts CPU cycles in ST mode du= ring program execution.", > + "ScaleUnit": "1CPU cycles", > + "MetricGroup": "SMT" > + }, > + { > + "MetricName": "crypto_percentage", > + "MetricExpr": "100 * (CRYPTO_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures crypto operations as a= percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "dtlb_mpki", > + "MetricExpr": "1000 * (DTLB_WALK / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of Data TLB= Walks per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;DTLB_Effectiveness" > + }, > + { > + "MetricName": "dtlb_walk_average_latency", > + "MetricExpr": "DTLB_WALK_PERCYC / DTLB_WALK", > + "BriefDescription": "This metric measures the average latency of= Data TLB walks in CPU cycles.", > + "ScaleUnit": "1CPU cycles", > + "MetricGroup": "Average_Latency" > + }, > + { > + "MetricName": "dtlb_walk_ratio", > + "MetricExpr": "DTLB_WALK / L1D_TLB", > + "BriefDescription": "This metric measures the ratio of Data TLB = Walks to the total number of Data TLB accesses. This gives an indication of= the effectiveness of the Data TLB accesses.", > + "ScaleUnit": "1per TLB access", > + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness" > + }, > + { > + "MetricName": "fp16_percentage", > + "MetricExpr": "100 * (FP_HP_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures half-precision floatin= g point operations as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "FP_Precision_Mix" > + }, > + { > + "MetricName": "fp32_percentage", > + "MetricExpr": "100 * (FP_SP_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures single-precision float= ing point operations as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "FP_Precision_Mix" > + }, > + { > + "MetricName": "fp64_percentage", > + "MetricExpr": "100 * (FP_DP_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures double-precision float= ing point operations as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "FP_Precision_Mix" > + }, > + { > + "MetricName": "fp_ops_per_cycle", > + "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYC= LES", > + "BriefDescription": "This metric measures floating point operati= ons per cycle in any precision performed by any instruction. Operations are= counted by computation and by vector lanes, fused computations such as mul= tiply-add count as twice per vector lane for example.", > + "ScaleUnit": "1operations per cycle", > + "MetricGroup": "FP_Arithmetic_Intensity" > + }, > + { > + "MetricName": "frontend_bound", > + "MetricExpr": "100 * (STALL_SLOT_FRONTEND_WITHOUT_MISPRED / CPU_= SLOT)", > + "BriefDescription": "This metric is the percentage of total slot= s that were stalled due to resource constraints in the frontend of the proc= essor.", > + "ScaleUnit": "1percent of slots", > + "MetricGroup": "TopdownL1" > + }, > + { > + "MetricName": "frontend_cache_l1i_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I += STALL_FRONTEND_MEM))", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend due to memory access latency issues caused by L1= I-cache misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_cache_l2i_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I += STALL_FRONTEND_MEM))", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend due to memory access latency issues caused by L2= I-cache misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_core_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_CPUBOUND / STALL_FRONTEND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend due to frontend Core resource constraints not re= lated to instruction fetch latency issues caused by memory access component= s.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_core_flow_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_FLOW / STALL_FRONTEND_CPUBO= UND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend as the decode unit is awaiting input from the br= anch prediction unit.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_core_flush_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUB= OUND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend as the processor is recovering from a pipeline f= lush caused by bad speculation or other machine resteers.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_mem_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_MEMBOUND / STALL_FRONTEND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend due to frontend Core resource constraints relate= d to the instruction fetch latency issues caused by memory access component= s.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_mem_cache_bound", > + "MetricExpr": "100 * ((STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) = / STALL_FRONTEND_MEMBOUND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend due to instruction fetch latency issues caused b= y I-cache misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_mem_tlb_bound", > + "MetricExpr": "100 * (STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOU= ND)", > + "BriefDescription": "This metric is the percentage of total cycl= es stalled in the frontend due to instruction fetch latency issues caused b= y Instruction TLB misses.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Topdown_Frontend" > + }, > + { > + "MetricName": "frontend_stalled_cycles", > + "MetricExpr": "100 * (STALL_FRONTEND / CPU_CYCLES)", > + "BriefDescription": "This metric is the percentage of cycles tha= t were stalled due to resource constraints in the frontend unit of the proc= essor.", > + "ScaleUnit": "1percent of cycles", > + "MetricGroup": "Cycle_Accounting" > + }, > + { > + "MetricName": "instruction_fetch_average_latency", > + "MetricExpr": "INST_FETCH_PERCYC / INST_FETCH", > + "BriefDescription": "This metric measures the average latency of= instruction fetches in CPU cycles.", > + "ScaleUnit": "1CPU cycles", > + "MetricGroup": "Average_Latency" > + }, > + { > + "MetricName": "integer_dp_percentage", > + "MetricExpr": "100 * (DP_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures scalar integer operati= ons as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "ipc", > + "MetricExpr": "INST_RETIRED / CPU_CYCLES", > + "BriefDescription": "This metric measures the number of instruct= ions retired per cycle.", > + "ScaleUnit": "1per cycle", > + "MetricGroup": "General" > + }, > + { > + "MetricName": "itlb_mpki", > + "MetricExpr": "1000 * (ITLB_WALK / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of instruct= ion TLB Walks per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;ITLB_Effectiveness" > + }, > + { > + "MetricName": "itlb_walk_average_latency", > + "MetricExpr": "ITLB_WALK_PERCYC / ITLB_WALK", > + "BriefDescription": "This metric measures the average latency of= instruction TLB walks in CPU cycles.", > + "ScaleUnit": "1CPU cycles", > + "MetricGroup": "Average_Latency" > + }, > + { > + "MetricName": "itlb_walk_ratio", > + "MetricExpr": "ITLB_WALK / L1I_TLB", > + "BriefDescription": "This metric measures the ratio of instructi= on TLB Walks to the total number of Instruction TLB accesses. This gives an= indication of the effectiveness of the Instruction TLB accesses.", > + "ScaleUnit": "1per TLB access", > + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness" > + }, > + { > + "MetricName": "l1d_cache_miss_ratio", > + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE", > + "BriefDescription": "This metric measures the ratio of L1 D-cach= e accesses missed to the total number of L1 D-cache accesses. This gives an= indication of the effectiveness of the L1 D-cache.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness" > + }, > + { > + "MetricName": "l1d_cache_mpki", > + "MetricExpr": "1000 * (L1D_CACHE_REFILL / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of L1 D-cac= he accesses missed per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;L1D_Cache_Effectiveness" > + }, > + { > + "MetricName": "l1d_cache_rw_miss_ratio", > + "MetricExpr": "l1d_demand_misses / l1d_demand_accesses", > + "BriefDescription": "This metric measures the ratio of L1 D-cach= e Read accesses missed to the total number of L1 D-cache accesses. This giv= es an indication of the effectiveness of the L1 D-cache for demand Load or = Store traffic.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1d_demand_accesses", > + "MetricExpr": "L1D_CACHE_RW", > + "BriefDescription": "This metric measures the count of L1 D-cach= e accesses incurred on Load or Store by the instruction stream of the progr= am.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1d_demand_misses", > + "MetricExpr": "L1D_CACHE_REFILL_RW", > + "BriefDescription": "This metric measures the count of L1 D-cach= e misses incurred on a Load or Store by the instruction stream of the progr= am.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1d_prf_accuracy", > + "MetricExpr": "100 * (l1d_useful_prf / l1d_refilled_prf)", > + "BriefDescription": "This metric measures the fraction of prefet= ched memory addresses that are used by the instruction stream.", > + "ScaleUnit": "1percent of prefetch", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1d_prf_coverage", > + "MetricExpr": "100 * (l1d_useful_prf / (l1d_demand_misses + l1d_= refilled_prf))", > + "BriefDescription": "This metric measures the baseline demand ca= che misses which the prefetcher brings into the cache.", > + "ScaleUnit": "1percent of cache access", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1d_refilled_prf", > + "MetricExpr": "L1D_CACHE_REFILL_HWPRF + L1D_CACHE_REFILL_PRFM + = L1D_LFB_HIT_RW_FHWPRF + L1D_LFB_HIT_RW_FPRFM", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L1 data prefetcher (hardware prefetches or software preload)= into L1 D-cache.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1d_tlb_miss_ratio", > + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB", > + "BriefDescription": "This metric measures the ratio of L1 Data T= LB accesses missed to the total number of L1 Data TLB accesses. This gives = an indication of the effectiveness of the L1 Data TLB.", > + "ScaleUnit": "1per TLB access", > + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness" > + }, > + { > + "MetricName": "l1d_tlb_mpki", > + "MetricExpr": "1000 * (L1D_TLB_REFILL / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of L1 Data = TLB accesses missed per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;DTLB_Effectiveness" > + }, > + { > + "MetricName": "l1d_useful_prf", > + "MetricExpr": "L1D_CACHE_HIT_RW_FPRF + L1D_LFB_HIT_RW_FHWPRF + L= 1D_LFB_HIT_RW_FPRFM", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L1 data prefetcher (hardware prefetches or software preload)= into L1 D-cache which are further used by Load or Store from the instructi= on stream of the program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1I_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_cache_miss_ratio", > + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE", > + "BriefDescription": "This metric measures the ratio of L1 I-cach= e accesses missed to the total number of L1 I-cache accesses. This gives an= indication of the effectiveness of the L1 I-cache.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness" > + }, > + { > + "MetricName": "l1i_cache_mpki", > + "MetricExpr": "1000 * (L1I_CACHE_REFILL / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of L1 I-cac= he accesses missed per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;L1I_Cache_Effectiveness" > + }, > + { > + "MetricName": "l1i_cache_rd_miss_ratio", > + "MetricExpr": "l1i_demand_misses / l1i_demand_accesses", > + "BriefDescription": "This metric measures the ratio of L1 I-cach= e Read accesses missed to the total number of L1 I-cache accesses. This giv= es an indication of the effectiveness of the L1 I-cache for demand instruct= ion fetch traffic. Note that cache accesses in this cache are demand instru= ction fetch.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_demand_accesses", > + "MetricExpr": "L1I_CACHE_RD", > + "BriefDescription": "This metric measures the count of L1 I-cach= e accesses caused by an instruction fetch by the instruction stream of the = program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_demand_misses", > + "MetricExpr": "L1I_CACHE_REFILL_RD", > + "BriefDescription": "This metric measures the count of L1 I-cach= e misses caused by an instruction fetch by the instruction stream of the pr= ogram.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_prf_accuracy", > + "MetricExpr": "100 * (l1i_useful_prf / l1i_refilled_prf)", > + "BriefDescription": "This metric measures the fraction of prefet= ched memory addresses that are used by the instruction stream.", > + "ScaleUnit": "1percent of prefetch", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_prf_coverage", > + "MetricExpr": "100 * (l1i_useful_prf / (l1i_demand_misses + l1i_= refilled_prf))", > + "BriefDescription": "This metric measures the baseline demand ca= che misses which the prefetcher brings into the cache.", > + "ScaleUnit": "1percent of cache access", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_refilled_prf", > + "MetricExpr": "L1I_CACHE_REFILL_HWPRF + L1I_CACHE_REFILL_PRFM", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L1 instruction prefetcher (hardware prefetches or software p= reload) into L1 I-cache.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l1i_tlb_miss_ratio", > + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB", > + "BriefDescription": "This metric measures the ratio of L1 Instru= ction TLB accesses missed to the total number of L1 Instruction TLB accesse= s. This gives an indication of the effectiveness of the L1 Instruction TLB.= ", > + "ScaleUnit": "1per TLB access", > + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness" > + }, > + { > + "MetricName": "l1i_tlb_mpki", > + "MetricExpr": "1000 * (L1I_TLB_REFILL / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of L1 Instr= uction TLB accesses missed per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;ITLB_Effectiveness" > + }, > + { > + "MetricName": "l1i_useful_prf", > + "MetricExpr": "L1I_CACHE_HIT_RD_FPRF", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L1 instruction prefetcher (hardware prefetches or software p= reload) into L1 I-cache which are further used by instruction stream of the= program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L1D_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2_cache_miss_ratio", > + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE", > + "BriefDescription": "This metric measures the ratio of L2 cache = accesses missed to the total number of L2 cache accesses. This gives an ind= ication of the effectiveness of the L2 cache, which is a unified cache that= stores both data and instruction.\nNote that cache accesses in this cache = are either data memory access or instruction fetch as this is a unified cac= he.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness" > + }, > + { > + "MetricName": "l2_cache_mpki", > + "MetricExpr": "1000 * (l2d_demand_misses / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of L2 unifi= ed cache accesses missed per thousand instructions executed.\nNote that cac= he accesses in this cache are either data memory access or instruction fetc= h as this is a unified cache.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;L2_Cache_Effectiveness" > + }, > + { > + "MetricName": "l2_tlb_miss_ratio", > + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB", > + "BriefDescription": "This metric measures the ratio of L2 unifie= d TLB accesses missed to the total number of L2 unified TLB accesses.\nThis= gives an indication of the effectiveness of the L2 TLB.", > + "ScaleUnit": "1per TLB access", > + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness" > + }, > + { > + "MetricName": "l2_tlb_mpki", > + "MetricExpr": "1000 * (L2D_TLB_REFILL / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of L2 unifi= ed TLB accesses missed per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness" > + }, > + { > + "MetricName": "l2d_cache_rwl1prf_miss_ratio", > + "MetricExpr": "l2d_demand_misses / l2d_demand_accesses", > + "BriefDescription": "This metric measures the ratio of L2 D-cach= e Read accesses missed to the total number of L2 D-cache accesses.\nThis gi= ves an indication of the effectiveness of the L2 D-cache for demand instruc= tion fetch, Load, Store, or L1 prefetcher accesses traffic.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2d_demand_accesses", > + "MetricExpr": "L2D_CACHE_RD + L2D_CACHE_WR + L2D_CACHE_L1PRF", > + "BriefDescription": "This metric measures the count of L2 D-cach= e accesses incurred on an instruction fetch, Load, Store, or L1 prefetcher = accesses by the instruction stream of the program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2d_demand_misses", > + "MetricExpr": "L2D_CACHE_REFILL_RD + L2D_CACHE_REFILL_WR + L2D_C= ACHE_REFILL_L1PRF", > + "BriefDescription": "This metric measures the count of L2 D-cach= e misses incurred on an instruction fetch, Load, Store, or L1 prefetcher ac= cesses by the instruction stream of the program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2d_prf_accuracy", > + "MetricExpr": "100 * (l2d_useful_prf / l2d_refilled_prf)", > + "BriefDescription": "This metric measures the fraction of prefet= ched memory addresses that are used by the instruction stream.", > + "ScaleUnit": "1percent of prefetch", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2d_prf_coverage", > + "MetricExpr": "100 * (l2d_useful_prf / (l2d_demand_misses + l2d_= refilled_prf))", > + "BriefDescription": "This metric measures the baseline demand ca= che misses which the prefetcher brings into the cache.", > + "ScaleUnit": "1percent of cache access", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2d_refilled_prf", > + "MetricExpr": "(L2D_CACHE_REFILL_PRF - L2D_CACHE_REFILL_L1PRF) += L2D_LFB_HIT_RWL1PRF_FHWPRF", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L2 data prefetcher (hardware prefetches or software preload)= into L2 D-cache.", > + "ScaleUnit": "1count", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l2d_useful_prf", > + "MetricExpr": "L2D_CACHE_HIT_RWL1PRF_FPRF + L2D_LFB_HIT_RWL1PRF_= FHWPRF", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L2 data prefetcher (hardware prefetches or software preload)= into L2 D-cache which are further used by instruction fetch, Load, Store, = or L1 prefetcher accesses from the instruction stream of the program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L2_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_cache_rwl1prfl2prf_miss_ratio", > + "MetricExpr": "l3d_demand_misses / l3d_demand_accesses", > + "BriefDescription": "This metric measures the ratio of L3 D-cach= e Read accesses missed to the total number of L3 D-cache accesses. This giv= es an indication of the effectiveness of the L2 D-cache for demand instruct= ion fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses traffic.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_demand_accesses", > + "MetricExpr": "L3D_CACHE_RWL1PRFL2PRF", > + "BriefDescription": "This metric measures the count of L3 D-cach= e accesses incurred on an instruction fetch, Load, Store, L1 prefetcher, or= L2 prefetcher accesses by the instruction stream of the program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_demand_misses", > + "MetricExpr": "L3D_CACHE_REFILL_RWL1PRFL2PRF", > + "BriefDescription": "This metric measures the count of L3 D-cach= e misses incurred on an instruction fetch, Load, Store, L1 prefetcher, or L= 2 prefetcher accesses by the instruction stream of the program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_prf_accuracy", > + "MetricExpr": "100 * (l3d_useful_prf / l3d_refilled_prf)", > + "BriefDescription": "This metric measures the fraction of prefet= ched memory addresses that are used by the instruction stream.", > + "ScaleUnit": "1percent of prefetch", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_prf_coverage", > + "MetricExpr": "100 * (l3d_useful_prf / (l3d_demand_misses + l3d_= refilled_prf))", > + "BriefDescription": "This metric measures the baseline demand ca= che misses which the prefetcher brings into the cache.", > + "ScaleUnit": "1percent of cache access", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_refilled_prf", > + "MetricExpr": "L3D_CACHE_REFILL_HWPRF + L3D_CACHE_REFILL_PRFM - = L3D_CACHE_REFILL_L1PRF - L3D_CACHE_REFILL_L2PRF", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L3 data prefetcher (hardware prefetches or software preload)= into L3 D-cache.", > + "ScaleUnit": "1count", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "l3d_useful_prf", > + "MetricExpr": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF", > + "BriefDescription": "This metric measures the count of cache lin= es refilled by L3 data prefetcher (hardware prefetches or software preload)= into L3 D-cache which are further used by instruction fetch, Load, Store, = L1 prefetcher, or L2 prefetcher accesses from the instruction stream of the= program.", > + "ScaleUnit": "1count", > + "MetricGroup": "L3_Prefetcher_Effectiveness" > + }, > + { > + "MetricName": "ll_cache_read_hit_ratio", > + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD", > + "BriefDescription": "This metric measures the ratio of last leve= l cache Read accesses hit in the cache to the total number of last level ca= che accesses. This gives an indication of the effectiveness of the last lev= el cache for Read traffic. Note that cache accesses in this cache are eithe= r data memory access or instruction fetch as this is a system level cache.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "LL_Cache_Effectiveness" > + }, > + { > + "MetricName": "ll_cache_read_miss_ratio", > + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD", > + "BriefDescription": "This metric measures the ratio of last leve= l cache Read accesses missed to the total number of last level cache access= es. This gives an indication of the effectiveness of the last level cache f= or Read traffic. Note that cache accesses in this cache are either data mem= ory access or instruction fetch as this is a system level cache.", > + "ScaleUnit": "1per cache access", > + "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness" > + }, > + { > + "MetricName": "ll_cache_read_mpki", > + "MetricExpr": "1000 * (LL_CACHE_MISS_RD / INST_RETIRED)", > + "BriefDescription": "This metric measures the number of last lev= el cache Read accesses missed per thousand instructions executed.", > + "ScaleUnit": "1MPKI", > + "MetricGroup": "MPKI;LL_Cache_Effectiveness" > + }, > + { > + "MetricName": "load_average_latency", > + "MetricExpr": "MEM_ACCESS_RD_PERCYC / MEM_ACCESS", > + "BriefDescription": "This metric measures the average latency of= Load operations in CPU cycles.", > + "ScaleUnit": "1CPU cycles", > + "MetricGroup": "Average_Latency" > + }, > + { > + "MetricName": "load_percentage", > + "MetricExpr": "100 * (LD_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures Load operations as a p= ercentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "nonsve_fp_ops_per_cycle", > + "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES", > + "BriefDescription": "This metric measures floating point operati= ons per cycle in any precision performed by an instruction that is not an S= VE instruction. Operations are counted by computation and by vector lanes, = fused computations such as multiply-add count as twice per vector lane for = example.", > + "ScaleUnit": "1operations per cycle", > + "MetricGroup": "FP_Arithmetic_Intensity" > + }, > + { > + "MetricName": "retiring", > + "MetricExpr": "100 * ((OP_RETIRED/OP_SPEC) * (1 - (STALL_SLOT/CP= U_SLOT)))", > + "BriefDescription": "This metric is the percentage of total slot= s that retired operations, which indicates cycles that were utilized effici= ently.", > + "ScaleUnit": "1percent of slots", > + "MetricGroup": "TopdownL1" > + }, > + { > + "MetricName": "scalar_fp_percentage", > + "MetricExpr": "100 * (VFP_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures scalar floating point = operations as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "simd_percentage", > + "MetricExpr": "100 * (ASE_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures advanced SIMD operatio= ns as a percentage of total operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "store_percentage", > + "MetricExpr": "100 * (ST_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures Store operations as a = percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "sve_all_percentage", > + "MetricExpr": "100 * (SVE_INST_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures scalable vector operat= ions, including Loads and Stores, as a percentage of operations speculative= ly executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "Operation_Mix" > + }, > + { > + "MetricName": "sve_fp_ops_per_cycle", > + "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES", > + "BriefDescription": "This metric measures floating point operati= ons per cycle in any precision performed by SVE instructions. Operations ar= e counted by computation and by vector lanes, fused computations such as mu= ltiply-add count as twice per vector lane for example.", > + "ScaleUnit": "1operations per cycle", > + "MetricGroup": "FP_Arithmetic_Intensity" > + }, > + { > + "MetricName": "sve_predicate_empty_percentage", > + "MetricExpr": "100 * (SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC)", > + "BriefDescription": "This metric measures scalable vector operat= ions with no active predicates as a percentage of SVE predicated operations= speculatively executed.", > + "ScaleUnit": "1percent of SVE predicated operations", > + "MetricGroup": "SVE_Effectiveness" > + }, > + { > + "MetricName": "sve_predicate_full_percentage", > + "MetricExpr": "100 * (SVE_PRED_FULL_SPEC / SVE_PRED_SPEC)", > + "BriefDescription": "This metric measures scalable vector operat= ions with all active predicates as a percentage of SVE predicated operation= s speculatively executed.", > + "ScaleUnit": "1percent of SVE predicated operations", > + "MetricGroup": "SVE_Effectiveness" > + }, > + { > + "MetricName": "sve_predicate_partial_percentage", > + "MetricExpr": "100 * (SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC)", > + "BriefDescription": "This metric measures scalable vector operat= ions with at least one active predicates as a percentage of SVE predicated = operations speculatively executed.", > + "ScaleUnit": "1percent of SVE predicated operations", > + "MetricGroup": "SVE_Effectiveness" > + }, > + { > + "MetricName": "sve_predicate_percentage", > + "MetricExpr": "100 * (SVE_PRED_SPEC / INST_SPEC)", > + "BriefDescription": "This metric measures scalable vector operat= ions with predicates as a percentage of operations speculatively executed.", > + "ScaleUnit": "1percent of operations", > + "MetricGroup": "SVE_Effectiveness" > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json b/too= ls/perf/pmu-events/arch/arm64/nvidia/t410/misc.json > new file mode 100644 > index 000000000000..8ff87d844e52 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json > @@ -0,0 +1,642 @@ > +[ > + { > + "ArchStdEvent": "SW_INCR", > + "PublicDescription": "This event counts software writes to the P= MSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a= manually updated counter for use by application software.\nThis event coul= d be used to measure any user program event, such as accesses to a particul= ar data structure (by writing to the PMSWINC_EL0 register each time the dat= a structure is accessed).\nTo use the PMSWINC_EL0 register and event, devel= opers must insert instructions that write to the PMSWINC_EL0 register into = the source code.\nSince the SW_INCR event records writes to the PMSWINC_EL0= register, there is no need to do a Read/Increment/Write sequence to the PM= SWINC_EL0 register." > + }, > + { > + "ArchStdEvent": "TRB_WRAP", > + "PublicDescription": "This event is generated each time the trac= e buffer current Write pointer is wrapped to the trace buffer base pointer." > + }, > + { > + "ArchStdEvent": "TRCEXTOUT0", > + "PublicDescription": "Trace unit external output 0." > + }, > + { > + "ArchStdEvent": "TRCEXTOUT1", > + "PublicDescription": "Trace unit external output 1." > + }, > + { > + "ArchStdEvent": "TRCEXTOUT2", > + "PublicDescription": "Trace unit external output 2." > + }, > + { > + "ArchStdEvent": "TRCEXTOUT3", > + "PublicDescription": "Trace unit external output 3." > + }, > + { > + "ArchStdEvent": "CTI_TRIGOUT4", > + "PublicDescription": "Cross-trigger Interface output trigger 4." > + }, > + { > + "ArchStdEvent": "CTI_TRIGOUT5", > + "PublicDescription": "Cross-trigger Interface output trigger 5." > + }, > + { > + "ArchStdEvent": "CTI_TRIGOUT6", > + "PublicDescription": "Cross-trigger Interface output trigger 6." > + }, > + { > + "ArchStdEvent": "CTI_TRIGOUT7", > + "PublicDescription": "Cross-trigger Interface output trigger 7." > + }, > + { > + "EventCode": "0x00e1", > + "EventName": "L1I_PRFM_REQ_DROP", > + "PublicDescription": "L1 I-cache software prefetch dropped." > + }, > + { > + "EventCode": "0x0100", > + "EventName": "L1_PF_REFILL", > + "PublicDescription": "L1 prefetch requests, refilled to L1 cache= =2E" > + }, > + { > + "EventCode": "0x0120", > + "EventName": "FLUSH", > + "PublicDescription": "This event counts both the CT flush and BX= flush. The BR_MIS_PRED counts the BX flushes. So the FLUSH-BR_MIS_PRED giv= es the CT flushes." > + }, > + { > + "EventCode": "0x0121", > + "EventName": "FLUSH_MEM", > + "PublicDescription": "Flushes due to memory hazards. This only i= ncludes CT flushes." > + }, > + { > + "EventCode": "0x0122", > + "EventName": "FLUSH_BAD_BRANCH", > + "PublicDescription": "Flushes due to bad predicted branch. This = only includes CT flushes." > + }, > + { > + "EventCode": "0x0123", > + "EventName": "FLUSH_STDBYPASS", > + "PublicDescription": "Flushes due to bad predecode. This only in= cludes CT flushes." > + }, > + { > + "EventCode": "0x0124", > + "EventName": "FLUSH_ISB", > + "PublicDescription": "Flushes due to ISB or similar side-effects= =2E This only includes CT flushes." > + }, > + { > + "EventCode": "0x0125", > + "EventName": "FLUSH_OTHER", > + "PublicDescription": "Flushes due to other hazards. This only in= cludes CT flushes." > + }, > + { > + "EventCode": "0x0126", > + "EventName": "STORE_STREAM", > + "PublicDescription": "Stored lines in streaming no-Write-allocat= e mode." > + }, > + { > + "EventCode": "0x0127", > + "EventName": "NUKE_RAR", > + "PublicDescription": "Load/Store nuke due to Read-after-Read ord= ering hazard." > + }, > + { > + "EventCode": "0x0128", > + "EventName": "NUKE_RAW", > + "PublicDescription": "Load/Store nuke due to Read-after-Write or= dering hazard." > + }, > + { > + "EventCode": "0x0129", > + "EventName": "L1_PF_GEN_PAGE", > + "PublicDescription": "Load/Store prefetch to L1 generated, Page = mode." > + }, > + { > + "EventCode": "0x012a", > + "EventName": "L1_PF_GEN_STRIDE", > + "PublicDescription": "Load/Store prefetch to L1 generated, strid= e mode." > + }, > + { > + "EventCode": "0x012b", > + "EventName": "L2_PF_GEN_LD", > + "PublicDescription": "Load prefetch to L2 generated." > + }, > + { > + "EventCode": "0x012d", > + "EventName": "LS_PF_TRAIN_TABLE_ALLOC", > + "PublicDescription": "LS prefetch train table entry allocated." > + }, > + { > + "EventCode": "0x0130", > + "EventName": "LS_PF_GEN_TABLE_ALLOC", > + "PublicDescription": "This event counts the number of cycles wit= h at least one table allocation, for L2 hardware prefetches (including the = software PRFM instructions that are converted into hardware prefetches due = to D-TLB miss).\nLS prefetch gen table allocation (for L2 prefetches)." > + }, > + { > + "EventCode": "0x0131", > + "EventName": "LS_PF_GEN_TABLE_ALLOC_PF_PEND", > + "PublicDescription": "This event counts the number of cycles in = which at least one hardware prefetch is dropped due to the inability to ide= ntify a victim when the generation table is full. The hardware prefetch con= sidered here includes the software PRFM that is converted into hardware pre= fetches due to D-TLB miss." > + }, > + { > + "EventCode": "0x0132", > + "EventName": "TBW", > + "PublicDescription": "Tablewalks." > + }, > + { > + "EventCode": "0x0134", > + "EventName": "S1L2_HIT", > + "PublicDescription": "Translation cache hit on S1L2 walk cache e= ntry." > + }, > + { > + "EventCode": "0x0135", > + "EventName": "S1L1_HIT", > + "PublicDescription": "Translation cache hit on S1L1 walk cache e= ntry." > + }, > + { > + "EventCode": "0x0136", > + "EventName": "S1L0_HIT", > + "PublicDescription": "Translation cache hit on S1L0 walk cache e= ntry." > + }, > + { > + "EventCode": "0x0137", > + "EventName": "S2L2_HIT", > + "PublicDescription": "Translation cache hit for S2L2 IPA walk ca= che entry." > + }, > + { > + "EventCode": "0x0138", > + "EventName": "IPA_REQ", > + "PublicDescription": "Translation cache lookups for IPA to PA en= tries." > + }, > + { > + "EventCode": "0x0139", > + "EventName": "IPA_REFILL", > + "PublicDescription": "Translation cache refills for IPA to PA en= tries." > + }, > + { > + "EventCode": "0x013a", > + "EventName": "S1_FLT", > + "PublicDescription": "Stage1 tablewalk fault." > + }, > + { > + "EventCode": "0x013b", > + "EventName": "S2_FLT", > + "PublicDescription": "Stage2 tablewalk fault." > + }, > + { > + "EventCode": "0x013c", > + "EventName": "COLT_REFILL", > + "PublicDescription": "Aggregated page refill." > + }, > + { > + "EventCode": "0x0145", > + "EventName": "L1_PF_HIT", > + "PublicDescription": "L1 prefetch requests, hitting in L1 cache." > + }, > + { > + "EventCode": "0x0146", > + "EventName": "L1_PF", > + "PublicDescription": "L1 prefetch requests." > + }, > + { > + "EventCode": "0x0147", > + "EventName": "CACHE_LS_REFILL", > + "PublicDescription": "L2 D-cache refill, Load/Store." > + }, > + { > + "EventCode": "0x0148", > + "EventName": "CACHE_PF", > + "PublicDescription": "L2 prefetch requests." > + }, > + { > + "EventCode": "0x0149", > + "EventName": "CACHE_PF_HIT", > + "PublicDescription": "L2 prefetch requests, hitting in L2 cache." > + }, > + { > + "EventCode": "0x0150", > + "EventName": "UNUSED_PF", > + "PublicDescription": "L2 unused prefetch." > + }, > + { > + "EventCode": "0x0151", > + "EventName": "PFT_SENT", > + "PublicDescription": "L2 prefetch TGT sent.\nNote that PFT_SENT = !=3D PFT_USEFUL + PFT_DROP. There may be PFT_SENT for which the accesses re= sulted in a SLC hit." > + }, > + { > + "EventCode": "0x0152", > + "EventName": "PFT_USEFUL", > + "PublicDescription": "L2 prefetch TGT useful." > + }, > + { > + "EventCode": "0x0153", > + "EventName": "PFT_DROP", > + "PublicDescription": "L2 prefetch TGT dropped." > + }, > + { > + "EventCode": "0x0162", > + "EventName": "LRQ_FULL", > + "PublicDescription": "This event counts the number of cycles the= LRQ is full." > + }, > + { > + "EventCode": "0x0163", > + "EventName": "FETCH_FQ_EMPTY", > + "PublicDescription": "Fetch Queue empty cycles." > + }, > + { > + "EventCode": "0x0164", > + "EventName": "FPG2", > + "PublicDescription": "Forward progress guarantee. Medium range l= ivelock triggered." > + }, > + { > + "EventCode": "0x0165", > + "EventName": "FPG", > + "PublicDescription": "Forward progress guarantee. Tofu global li= velock buster is triggered." > + }, > + { > + "EventCode": "0x0172", > + "EventName": "DEADBLOCK", > + "PublicDescription": "Write-back evictions converted to dataless= EVICT.\nThe victim line is deemed deadblock if the likeliness of a reuse i= s low. The Core uses dataless evict to evict a deadblock; and it uses an ev= ict with data to evict an L2 line that is not a deadblock." > + }, > + { > + "EventCode": "0x0173", > + "EventName": "PF_PRQ_ALLOC_PF_PEND", > + "PublicDescription": "L1 prefetch prq allocation (replacing pend= ing)." > + }, > + { > + "EventCode": "0x0178", > + "EventName": "FETCH_ICACHE_INSTR", > + "PublicDescription": "Instructions fetched from I-cache." > + }, > + { > + "EventCode": "0x017b", > + "EventName": "NEAR_CAS", > + "PublicDescription": "Near atomics: compare and swap." > + }, > + { > + "EventCode": "0x017c", > + "EventName": "NEAR_CAS_PASS", > + "PublicDescription": "Near atomics: compare and swap pass." > + }, > + { > + "EventCode": "0x017d", > + "EventName": "FAR_CAS", > + "PublicDescription": "Far atomics: compare and swap." > + }, > + { > + "EventCode": "0x0186", > + "EventName": "L2_BTB_RELOAD_MAIN_BTB", > + "PublicDescription": "Number of completed L1 BTB update initiate= d by L2 BTB hit which swap branch information between L1 BTB and L2 BTB." > + }, > + { > + "EventCode": "0x018f", > + "EventName": "L1_PF_GEN_MCMC", > + "PublicDescription": "Load/Store prefetch to L1 generated, MCMC." > + }, > + { > + "EventCode": "0x0190", > + "EventName": "PF_MODE_0_CYCLES", > + "PublicDescription": "Number of cycles in which the hardware pre= fetcher is in the most aggressive mode." > + }, > + { > + "EventCode": "0x0191", > + "EventName": "PF_MODE_1_CYCLES", > + "PublicDescription": "Number of cycles in which the hardware pre= fetcher is in the more aggressive mode." > + }, > + { > + "EventCode": "0x0192", > + "EventName": "PF_MODE_2_CYCLES", > + "PublicDescription": "Number of cycles in which the hardware pre= fetcher is in the less aggressive mode." > + }, > + { > + "EventCode": "0x0193", > + "EventName": "PF_MODE_3_CYCLES", > + "PublicDescription": "Number of cycles in which the hardware pre= fetcher is in the most conservative mode." > + }, > + { > + "EventCode": "0x0194", > + "EventName": "TXREQ_LIMIT_MAX_CYCLES", > + "PublicDescription": "Number of cycles in which the dynamic TXRE= Q limit is the L2_TQ_SIZE." > + }, > + { > + "EventCode": "0x0195", > + "EventName": "TXREQ_LIMIT_3QUARTER_CYCLES", > + "PublicDescription": "Number of cycles in which the dynamic TXRE= Q limit is between 3/4 of the L2_TQ_SIZE and the L2_TQ_SIZE-1." > + }, > + { > + "EventCode": "0x0196", > + "EventName": "TXREQ_LIMIT_HALF_CYCLES", > + "PublicDescription": "Number of cycles in which the dynamic TXRE= Q limit is between 1/2 of the L2_TQ_SIZE and 3/4 of the L2_TQ_SIZE." > + }, > + { > + "EventCode": "0x0197", > + "EventName": "TXREQ_LIMIT_1QUARTER_CYCLES", > + "PublicDescription": "Number of cycles in which the dynamic TXRE= Q limit is between 1/4 of the L2_TQ_SIZE and 1/2 of the L2_TQ_SIZE." > + }, > + { > + "EventCode": "0x019d", > + "EventName": "PREFETCH_LATE_CMC", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by CMC prefetch request." > + }, > + { > + "EventCode": "0x019e", > + "EventName": "PREFETCH_LATE_BO", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by BO prefetch request." > + }, > + { > + "EventCode": "0x019f", > + "EventName": "PREFETCH_LATE_STRIDE", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by STRIDE prefetch request." > + }, > + { > + "EventCode": "0x01a0", > + "EventName": "PREFETCH_LATE_SPATIAL", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by SPATIAL prefetch request." > + }, > + { > + "EventCode": "0x01a2", > + "EventName": "PREFETCH_LATE_TBW", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by TBW prefetch request." > + }, > + { > + "EventCode": "0x01a3", > + "EventName": "PREFETCH_LATE_PAGE", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by PAGE prefetch request." > + }, > + { > + "EventCode": "0x01a4", > + "EventName": "PREFETCH_LATE_GSMS", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by GSMS prefetch request." > + }, > + { > + "EventCode": "0x01a5", > + "EventName": "PREFETCH_LATE_SIP_CONS", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit o= n TQ entry allocated by SIP_CONS prefetch request." > + }, > + { > + "EventCode": "0x01a6", > + "EventName": "PREFETCH_REFILL_CMC", > + "PublicDescription": "PF/prefetch or PF/readclean request from C= MC pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01a7", > + "EventName": "PREFETCH_REFILL_BO", > + "PublicDescription": "PF/prefetch or PF/readclean request from B= O pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01a8", > + "EventName": "PREFETCH_REFILL_STRIDE", > + "PublicDescription": "PF/prefetch or PF/readclean request from S= TRIDE pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01a9", > + "EventName": "PREFETCH_REFILL_SPATIAL", > + "PublicDescription": "PF/prefetch or PF/readclean request from S= PATIAL pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01ab", > + "EventName": "PREFETCH_REFILL_TBW", > + "PublicDescription": "PF/prefetch or PF/readclean request from T= BW pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01ac", > + "EventName": "PREFETCH_REFILL_PAGE", > + "PublicDescription": "PF/prefetch or PF/readclean request from P= AGE pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01ad", > + "EventName": "PREFETCH_REFILL_GSMS", > + "PublicDescription": "PF/prefetch or PF/readclean request from G= SMS pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01ae", > + "EventName": "PREFETCH_REFILL_SIP_CONS", > + "PublicDescription": "PF/prefetch or PF/readclean request from S= IP_CONS pf engine filled the L2 cache." > + }, > + { > + "EventCode": "0x01af", > + "EventName": "CACHE_HIT_LINE_PF_CMC", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by CMC prefetch request." > + }, > + { > + "EventCode": "0x01b0", > + "EventName": "CACHE_HIT_LINE_PF_BO", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by BO prefetch request." > + }, > + { > + "EventCode": "0x01b1", > + "EventName": "CACHE_HIT_LINE_PF_STRIDE", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by STRIDE prefetch request." > + }, > + { > + "EventCode": "0x01b2", > + "EventName": "CACHE_HIT_LINE_PF_SPATIAL", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by SPATIAL prefetch request." > + }, > + { > + "EventCode": "0x01b4", > + "EventName": "CACHE_HIT_LINE_PF_TBW", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by TBW prefetch request." > + }, > + { > + "EventCode": "0x01b5", > + "EventName": "CACHE_HIT_LINE_PF_PAGE", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by PAGE prefetch request." > + }, > + { > + "EventCode": "0x01b6", > + "EventName": "CACHE_HIT_LINE_PF_GSMS", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by GSMS prefetch request." > + }, > + { > + "EventCode": "0x01b7", > + "EventName": "CACHE_HIT_LINE_PF_SIP_CONS", > + "PublicDescription": "LS/readclean or LS/readunique lookup hit i= n L2 cache on line filled by SIP_CONS prefetch request." > + }, > + { > + "EventCode": "0x01ba", > + "EventName": "PREFETCH_LATE_STORE_ISSUE", > + "PublicDescription": "This event counts the number of demand req= uests that matches a Store-issue prefetcher's pending refill request. These= are called late prefetch requests and are still counted as useful prefetch= er requests for the sake of accuracy and coverage measurements." > + }, > + { > + "EventCode": "0x01bb", > + "EventName": "PREFETCH_LATE_STORE_STRIDE", > + "PublicDescription": "This event counts the number of demand req= uests that matches a Store-stride prefetcher's pending refill request. Thes= e are called late prefetch requests and are still counted as useful prefetc= her requests for the sake of accuracy and coverage measurements." > + }, > + { > + "EventCode": "0x01bc", > + "EventName": "PREFETCH_LATE_PC_OFFSET", > + "PublicDescription": "This event counts the number of demand req= uests that matches a PC-offset prefetcher's pending refill request. These a= re called late prefetch requests and are still counted as useful prefetcher= requests for the sake of accuracy and coverage measurements." > + }, > + { > + "EventCode": "0x01bd", > + "EventName": "PREFETCH_LATE_IFUPF", > + "PublicDescription": "This event counts the number of demand req= uests that matches a IFU prefetcher's pending refill request. These are cal= led late prefetch requests and are still counted as useful prefetcher reque= sts for the sake of accuracy and coverage measurements." > + }, > + { > + "EventCode": "0x01be", > + "EventName": "PREFETCH_REFILL_STORE_ISSUE", > + "PublicDescription": "This event counts the number of cache refi= lls due to Store-Issue prefetcher." > + }, > + { > + "EventCode": "0x01bf", > + "EventName": "PREFETCH_REFILL_STORE_STRIDE", > + "PublicDescription": "This event counts the number of cache refi= lls due to Store-stride prefetcher." > + }, > + { > + "EventCode": "0x01c0", > + "EventName": "PREFETCH_REFILL_PC_OFFSET", > + "PublicDescription": "This event counts the number of cache refi= lls due to PC-offset prefetcher." > + }, > + { > + "EventCode": "0x01c1", > + "EventName": "PREFETCH_REFILL_IFUPF", > + "PublicDescription": "This event counts the number of cache refi= lls due to IFU prefetcher." > + }, > + { > + "EventCode": "0x01c2", > + "EventName": "CACHE_HIT_LINE_PF_STORE_ISSUE", > + "PublicDescription": "This event counts the number of first hit = to a cache line filled by Store-issue prefetcher." > + }, > + { > + "EventCode": "0x01c3", > + "EventName": "CACHE_HIT_LINE_PF_STORE_STRIDE", > + "PublicDescription": "This event counts the number of first hit = to a cache line filled by Store-stride prefetcher." > + }, > + { > + "EventCode": "0x01c4", > + "EventName": "CACHE_HIT_LINE_PF_PC_OFFSET", > + "PublicDescription": "This event counts the number of first hit = to a cache line filled by PC-offset prefetcher." > + }, > + { > + "EventCode": "0x01c5", > + "EventName": "CACHE_HIT_LINE_PF_IFUPF", > + "PublicDescription": "This event counts the number of first hit = to a cache line filled by IFU prefetcher." > + }, > + { > + "EventCode": "0x01c6", > + "EventName": "L2_PF_GEN_ST_ISSUE", > + "PublicDescription": "Store-issue prefetch to L2 generated." > + }, > + { > + "EventCode": "0x01c7", > + "EventName": "L2_PF_GEN_ST_STRIDE", > + "PublicDescription": "Store-stride prefetch to L2 generated" > + }, > + { > + "EventCode": "0x01cb", > + "EventName": "L2_TQ_OUTSTANDING", > + "PublicDescription": "Outstanding tracker count, per cycle.\nThi= s event increments by the number of valid entries pertaining to this thread= in the L2TQ, in each cycle.\nThis event can be used to calculate the occup= ancy of L2TQ by dividing this by the CPU_CYCLES event. The L2TQ queue track= s the outstanding Read, Write and Snoop transactions. The Read transaction = and the Write transaction entries are attributable to PE, whereas the Snoop= transactions are not always attributable to PE." > + }, > + { > + "EventCode": "0x01cc", > + "EventName": "TXREQ_LIMIT_COUNT_CYCLES", > + "PublicDescription": "This event increments by the dynamic TXREQ= value, in each cycle.\nThis is a companion event of TXREQ_LIMIT_MAX_CYCLES= , TXREQ_LIMIT_3QUARTER_CYCLES, TXREQ_LIMIT_HALF_CYCLES, and TXREQ_LIMIT_1QU= ARTER_CYCLES." > + }, > + { > + "EventCode": "0x01ce", > + "EventName": "L3DPRFM_TO_L2PRQ_CONVERTED", > + "PublicDescription": "This event counts the number of Converted-= L3D-PRFMs. These are indeed L3D PRFM and activities around these PRFM are c= ounted by the L3D_CACHE_PRFM, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL Ev= ents." > + }, > + { > + "EventCode": "0x01d2", > + "EventName": "DVM_TLBI_RCVD", > + "PublicDescription": "This event counts the number of TLBI DVM m= essage received over CHI interface, for *this* Core." > + }, > + { > + "EventCode": "0x01d6", > + "EventName": "DSB_COMMITING_LOCAL_TLBI", > + "PublicDescription": "This event counts the number of DSB that a= re retired and committed at least one local TLBI instruction. This event in= crements no more than once (in a cycle) even if the DSB commits multiple lo= cal TLBI instruction." > + }, > + { > + "EventCode": "0x01d7", > + "EventName": "DSB_COMMITING_BROADCAST_TLBI", > + "PublicDescription": "This event counts the number of DSB that a= re retired and committed at least one broadcast TLBI instruction. This even= t increments no more than once (in a cycle) even if the DSB commits multipl= e broadcast TLBI instruction." > + }, > + { > + "EventCode": "0x01eb", > + "EventName": "L1DPRFM_L2DPRFM_TO_L2PRQ_CONVERTED", > + "PublicDescription": "This event counts the number of Converted-= L1D-PRFMs and Converted-L2D-PRFM.\nActivities involving the Converted-L1D-P= RFM are counted by the L1D_CACHE_PRFM. However they are *not* counted by th= e L1D_CACHE_REFILL_PRFM, and L1D_CACHE_REFILL, as these Converted-L1D-PRFM = are treated as L2 D hardware prefetches. Activities around the Converted-L1= D-PRFMs and Converted-L2D-PRFMs are counted by the L2D_CACHE_PRFM, L2D_CACH= E_REFILL_PRFM and L2D_CACHE_REFILL Events." > + }, > + { > + "EventCode": "0x01ec", > + "EventName": "PREFETCH_LATE_CONVERTED_PRFM", > + "PublicDescription": "This event counts the number of demand req= uests that matches a Converted-L1D-PRFM or Converted-L2D-PRFM pending refil= l request at L2 D-cache. These are called late prefetch requests and are st= ill counted as useful prefetcher requests for the sake of accuracy and cove= rage measurements.\nNote that this event is not counted by the L2D_CACHE_HI= T_RWL1PRF_LATE_HWPRF, though the Converted-L1D-PRFM or Converted-L2D-PRFM a= re replayed by the L2PRQ." > + }, > + { > + "EventCode": "0x01ed", > + "EventName": "PREFETCH_REFILL_CONVERTED_PRFM", > + "PublicDescription": "This event counts the number of L2 D-cache= refills due to Converted-L1D-PRFM or Converted-L2D-PRFM.\nNote : L2D_CACHE= _REFILL_PRFM is inclusive of PREFETCH_REFILL_PRFM_CONVERTED, where both the= PREFETCH_REFILL_PRFM_CONVERTED and the L2D_CACHE_REFILL_PRFM increment whe= n L2 D-cache refills due to Converted-L1D-PRFM or Converted-L2D-PRFM." > + }, > + { > + "EventCode": "0x01ee", > + "EventName": "CACHE_HIT_LINE_PF_CONVERTED_PRFM", > + "PublicDescription": "This event counts the number of first hit = to a cache line filled by Converted-L1D-PRFM or Converted-L2D-PRFM.\nNote t= hat L2D_CACHE_HIT_RWL1PRF_FPRFM is inclusive of CACHE_HIT_LINE_PF_CONVERTED= _PRFM, where both the CACHE_HIT_LINE_PF_CONVERTED_PRFM and the L2D_CACHE_HI= T_RWL1PRF_FPRFM increment on a first hit to L2 D-cache filled by Converted-= L1D-PRFM or Converted-L2D-PRFM." > + }, > + { > + "EventCode": "0x01f0", > + "EventName": "TMS_ST_TO_SMT_LATENCY", > + "PublicDescription": "This event counts the number of CPU cycles= spent on TMS for ST-to-SMT switch.\nThis event is counted by both the thre= ads - This event in both threads increment during TMS for ST-to-SMT switch." > + }, > + { > + "EventCode": "0x01f1", > + "EventName": "TMS_SMT_TO_ST_LATENCY", > + "PublicDescription": "This event counts the number of CPU cycles= spent on TMS for SMT-to-ST switch. The count also includes the CPU cycles = spend due to an aborted SMT-to-ST TMS attempt.\nThis event is counted only = by the thread that is not in WFI." > + }, > + { > + "EventCode": "0x01f2", > + "EventName": "TMS_ST_TO_SMT_COUNT", > + "PublicDescription": "This event counts the number of completed = TMS from ST-to-SMT.\nThis event is counted only by the active thread (the o= ne that is not in WFI).\nNote: When an active thread enters the Debug state= in ST-Full resource mode, it is switched to SMT mode. This is because the = inactive thread cannot wake up while the other thread remains in the Debug = state. To prEvent this issue, threads operating in ST-Full resource mode ar= e transitioned to SMT mode upon entering Debug state. This event count will= also reflect such switches from ST to SMT mode.\n(Also see the (NV_CPUACTL= R14_EL1.chka_prEvent_st_tx_to_smt_when_tx_in_debug_state bit to disable thi= s behavior.)" > + }, > + { > + "EventCode": "0x01f3", > + "EventName": "TMS_SMT_TO_ST_COUNT", > + "PublicDescription": "This event counts the number of completed = TMS from SMT-to-ST.\nThis event is counted only by the thread that is not i= n WFI." > + }, > + { > + "EventCode": "0x01f4", > + "EventName": "TMS_SMT_TO_ST_COUNT_ABRT", > + "PublicDescription": "This event counts the number of aborted TM= S from SMT-to-ST.\nThis event is counted only by the thread that is not in = WFI." > + }, > + { > + "EventCode": "0x0202", > + "EventName": "L0I_CACHE_RD", > + "PublicDescription": "This event counts the number of predict bl= ocks serviced out of L0 I-cache.\nNote: The L0 I-cache performs at most 4 L= 0 I look-up in a cycle. Two of which are to service PB from L0 I. And the o= ther two to refill L0 I-cache from L1 I. This event count only the L0 I-cac= he lookup pertaining to servicing the PB from L0 I." > + }, > + { > + "EventCode": "0x0203", > + "EventName": "L0I_CACHE_REFILL", > + "PublicDescription": "This event counts the number of L0I cache = refill from L1 I-cache." > + }, > + { > + "EventCode": "0x0207", > + "EventName": "INTR_LATENCY", > + "PublicDescription": "This event counts the number of cycles ela= psed between when an Interrupt is recognized (after masking) to when a uop = associated with the first instruction in the destination exception level is= allocated. If there is some other flush condition that pre-empts the Inter= rupt, then the cycles counted terminates early at the first instruction exe= cuted after that flush. In the event of dropped Interrupts (when an Interru= pt is deasserted before it is taken), this counter measures the number of c= ycles that elapse from the moment an Interrupt is recognized (post-masking)= until the Interrupt is dropped or deasserted.\nNote that\n* IESB(Implicit = Error Synchronization Barrier) is an internal mop, so the latency of an imp= licit IESB mop executed before the Interrupt taken is included in the Inter= rupt latency count.\n* Nukes or TMS sequence within the window are also cou= nted by the Interrupt latency Event.\n* A SMT to ST TMS will be aborted on = detecting the wake condition for the WFI thread. The Interrupt latency coun= t includes any additional penalty for an aborted TMS." > + }, > + { > + "EventCode": "0x021c", > + "EventName": "CWT_ALLOC_ENTRY", > + "PublicDescription": "Cache Way Tracker Allocate entry." > + }, > + { > + "EventCode": "0x021d", > + "EventName": "CWT_ALLOC_LINE", > + "PublicDescription": "Cache Way Tracker Allocate line." > + }, > + { > + "EventCode": "0x021e", > + "EventName": "CWT_HIT", > + "PublicDescription": "Cache Way Tracker hit." > + }, > + { > + "EventCode": "0x021f", > + "EventName": "CWT_HIT_TAG", > + "PublicDescription": "Cache Way Tracker hit when ITAG lookup sup= pressed." > + }, > + { > + "EventCode": "0x0220", > + "EventName": "CWT_REPLAY_TAG", > + "PublicDescription": "Cache Way Tracker causes ITAG replay due t= o miss when ITAG lookup suppressed." > + }, > + { > + "EventCode": "0x0250", > + "EventName": "GPT_REQ", > + "PublicDescription": "GPT lookup." > + }, > + { > + "EventCode": "0x0251", > + "EventName": "GPT_WC_HIT", > + "PublicDescription": "GPT lookup hit in Walk cache." > + }, > + { > + "EventCode": "0x0252", > + "EventName": "GPT_PG_HIT", > + "PublicDescription": "GPT lookup hit in TLB." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json > new file mode 100644 > index 000000000000..34c7eefa66b0 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json > @@ -0,0 +1,94 @@ > +[ > + { > + "ArchStdEvent": "INST_RETIRED", > + "PublicDescription": "This event counts instructions that have b= een architecturally executed." > + }, > + { > + "ArchStdEvent": "CID_WRITE_RETIRED", > + "PublicDescription": "This event counts architecturally executed= writes to the CONTEXTIDR_EL1 register, which usually contains the kernel P= ID and can be output with hardware trace." > + }, > + { > + "ArchStdEvent": "BR_IMMED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= direct branches." > + }, > + { > + "ArchStdEvent": "BR_RETURN_RETIRED", > + "PublicDescription": "This event counts architecturally executed= procedure returns." > + }, > + { > + "ArchStdEvent": "TTBR_WRITE_RETIRED", > + "PublicDescription": "This event counts architectural writes to = TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the = HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to = TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are = typically updated when the kernel is swapping user-space threads or applica= tions." > + }, > + { > + "ArchStdEvent": "BR_RETIRED", > + "PublicDescription": "This event counts architecturally executed= branches, whether the branch is taken or not. Instructions that explicitly= write to the PC are also counted. Note that exception generating instructi= ons, exception return instructions, and context synchronization instruction= s are not counted." > + }, > + { > + "ArchStdEvent": "BR_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts branches counted by BR_R= ETIRED which were mispredicted and caused a pipeline flush." > + }, > + { > + "ArchStdEvent": "OP_RETIRED", > + "PublicDescription": "This event counts micro-operations that ar= e architecturally executed. This is a count of number of micro-operations r= etired from the commit queue in a single cycle." > + }, > + { > + "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED", > + "PublicDescription": "This event counts architecturally executed= indirect branches excluding procedure returns that were taken." > + }, > + { > + "ArchStdEvent": "BR_IMMED_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= direct branches that were correctly predicted." > + }, > + { > + "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= direct branches that were mispredicted and caused a pipeline flush." > + }, > + { > + "ArchStdEvent": "BR_IND_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= indirect branches including procedure returns that were correctly predicte= d." > + }, > + { > + "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= indirect branches including procedure returns that were mispredicted and c= aused a pipeline flush." > + }, > + { > + "ArchStdEvent": "BR_RETURN_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= procedure returns that were correctly predicted." > + }, > + { > + "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= procedure returns that were mispredicted and caused a pipeline flush." > + }, > + { > + "ArchStdEvent": "BR_INDNR_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= indirect branches excluding procedure returns that were correctly predicte= d." > + }, > + { > + "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= indirect branches excluding procedure returns that were mispredicted and c= aused a pipeline flush." > + }, > + { > + "ArchStdEvent": "BR_TAKEN_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= branches that were taken and were correctly predicted." > + }, > + { > + "ArchStdEvent": "BR_TAKEN_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= branches that were taken and were mispredicted causing a pipeline flush." > + }, > + { > + "ArchStdEvent": "BR_SKIP_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= branches that were not taken and were correctly predicted." > + }, > + { > + "ArchStdEvent": "BR_SKIP_MIS_PRED_RETIRED", > + "PublicDescription": "This event counts architecturally executed= branches that were not taken and were mispredicted causing a pipeline flus= h." > + }, > + { > + "ArchStdEvent": "BR_PRED_RETIRED", > + "PublicDescription": "This event counts branch instructions coun= ted by BR_RETIRED which were correctly predicted." > + }, > + { > + "ArchStdEvent": "BR_IND_RETIRED", > + "PublicDescription": "This event counts architecturally executed= indirect branches including procedure returns." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json b/tool= s/perf/pmu-events/arch/arm64/nvidia/t410/spe.json > new file mode 100644 > index 000000000000..00d0c5051a48 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json > @@ -0,0 +1,42 @@ > +[ > + { > + "ArchStdEvent": "SAMPLE_POP", > + "PublicDescription": "This event counts statistical profiling sa= mple population, the count of all operations that could be sampled but may = or may not be chosen for sampling." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED", > + "PublicDescription": "This event counts statistical profiling sa= mples taken for sampling." > + }, > + { > + "ArchStdEvent": "SAMPLE_FILTRATE", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are not removed by filtering." > + }, > + { > + "ArchStdEvent": "SAMPLE_COLLISION", > + "PublicDescription": "This event counts statistical profiling sa= mples that have collided with a previous sample and so therefore not taken." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED_BR", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are branches." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED_LD", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are Loads or Load atomic operations." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED_ST", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are Stores or Store atomic operations." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED_OP", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are matching any operation type filters supported." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED_EVENT", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are matching event packet filter constraints." > + }, > + { > + "ArchStdEvent": "SAMPLE_FEED_LAT", > + "PublicDescription": "This event counts statistical profiling sa= mples taken which are exceeding minimum latency set by operation latency fi= lter constraints." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.= json b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json > new file mode 100644 > index 000000000000..8bc802f5f350 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json > @@ -0,0 +1,230 @@ > +[ > + { > + "ArchStdEvent": "INST_SPEC", > + "PublicDescription": "This event counts operations that have bee= n speculatively executed." > + }, > + { > + "ArchStdEvent": "OP_SPEC", > + "PublicDescription": "This event counts micro-operations specula= tively executed. This is the count of the number of micro-operations dispat= ched in a cycle." > + }, > + { > + "ArchStdEvent": "UNALIGNED_LD_SPEC", > + "PublicDescription": "This event counts unaligned memory Read op= erations issued by the CPU. This event counts unaligned accesses (as define= d by the actual instruction), even if they are subsequently issued as multi= ple aligned accesses.\nThis event does not count preload operations (PLD, P= LI).\nThis event is a subset of the UNALIGNED_LDST_SPEC event." > + }, > + { > + "ArchStdEvent": "UNALIGNED_ST_SPEC", > + "PublicDescription": "This event counts unaligned memory Write o= perations issued by the CPU. This event counts unaligned accesses (as defin= ed by the actual instruction), even if they are subsequently issued as mult= iple aligned accesses.\nThis event is a subset of the UNALIGNED_LDST_SPEC e= vent." > + }, > + { > + "ArchStdEvent": "UNALIGNED_LDST_SPEC", > + "PublicDescription": "This event counts unaligned memory operati= ons issued by the CPU. This event counts unaligned accesses (as defined by = the actual instruction), even if they are subsequently issued as multiple a= ligned accesses.\nThis event is the sum of the following events:\nUNALIGNED= _ST_SPEC and\nUNALIGNED_LD_SPEC." > + }, > + { > + "ArchStdEvent": "LDREX_SPEC", > + "PublicDescription": "This event counts Load-Exclusive operation= s that have been speculatively executed. For example: LDREX, LDX" > + }, > + { > + "ArchStdEvent": "STREX_PASS_SPEC", > + "PublicDescription": "This event counts Store-exclusive operatio= ns that have been speculatively executed and have successfully completed th= e Store operation." > + }, > + { > + "ArchStdEvent": "STREX_FAIL_SPEC", > + "PublicDescription": "This event counts Store-exclusive operatio= ns that have been speculatively executed and have not successfully complete= d the Store operation." > + }, > + { > + "ArchStdEvent": "STREX_SPEC", > + "PublicDescription": "This event counts Store-exclusive operatio= ns that have been speculatively executed.\nThis event is the sum of the fol= lowing events:\nSTREX_PASS_SPEC and\nSTREX_FAIL_SPEC." > + }, > + { > + "ArchStdEvent": "LD_SPEC", > + "PublicDescription": "This event counts speculatively executed L= oad operations including Single Instruction Multiple Data (SIMD) Load opera= tions." > + }, > + { > + "ArchStdEvent": "ST_SPEC", > + "PublicDescription": "This event counts speculatively executed S= tore operations including Single Instruction Multiple Data (SIMD) Store ope= rations." > + }, > + { > + "ArchStdEvent": "LDST_SPEC", > + "PublicDescription": "This event counts Load and Store operation= s that have been speculatively executed." > + }, > + { > + "ArchStdEvent": "DP_SPEC", > + "PublicDescription": "This event counts speculatively executed l= ogical or arithmetic instructions such as MOV/MVN operations." > + }, > + { > + "ArchStdEvent": "ASE_SPEC", > + "PublicDescription": "This event counts speculatively executed A= dvanced SIMD operations excluding Load, Store, and Move micro-operations th= at move data to or from SIMD (vector) registers." > + }, > + { > + "ArchStdEvent": "VFP_SPEC", > + "PublicDescription": "This event counts speculatively executed f= loating point operations. This event does not count operations that move da= ta to or from floating point (vector) registers." > + }, > + { > + "ArchStdEvent": "PC_WRITE_SPEC", > + "PublicDescription": "This event counts speculatively executed o= perations which cause software changes of the PC. Those operations include = all taken branch operations." > + }, > + { > + "ArchStdEvent": "CRYPTO_SPEC", > + "PublicDescription": "This event counts speculatively executed c= ryptographic operations except for PMULL and VMULL operations." > + }, > + { > + "ArchStdEvent": "BR_IMMED_SPEC", > + "PublicDescription": "This event counts direct branch operations= which are speculatively executed." > + }, > + { > + "ArchStdEvent": "BR_RETURN_SPEC", > + "PublicDescription": "This event counts procedure return operati= ons (RET, RETAA and RETAB) which are speculatively executed." > + }, > + { > + "ArchStdEvent": "BR_INDIRECT_SPEC", > + "PublicDescription": "This event counts indirect branch operatio= ns including procedure returns, which are speculatively executed. This incl= udes operations that force a software change of the PC, other than exceptio= n-generating operations and direct branch instructions. Some examples of th= e instructions counted by this event include BR Xn, RET, etc." > + }, > + { > + "ArchStdEvent": "ISB_SPEC", > + "PublicDescription": "This event counts ISB operations that are = executed." > + }, > + { > + "ArchStdEvent": "DSB_SPEC", > + "PublicDescription": "This event counts DSB operations that are = speculatively issued to Load/Store unit in the CPU." > + }, > + { > + "ArchStdEvent": "DMB_SPEC", > + "PublicDescription": "This event counts DMB operations that are = speculatively issued to the Load/Store unit in the CPU. This event does not= count implied barriers from Load-acquire/Store-release operations." > + }, > + { > + "ArchStdEvent": "CSDB_SPEC", > + "PublicDescription": "This event counts CSDB operations that are= speculatively issued to the Load/Store unit in the CPU. This event does no= t count implied barriers from Load-acquire/Store-release operations." > + }, > + { > + "ArchStdEvent": "RC_LD_SPEC", > + "PublicDescription": "This event counts any Load acquire operati= ons that are speculatively executed. For example: LDAR, LDARH, LDARB" > + }, > + { > + "ArchStdEvent": "RC_ST_SPEC", > + "PublicDescription": "This event counts any Store release operat= ions that are speculatively executed. For example: STLR, STLRH, STLRB" > + }, > + { > + "ArchStdEvent": "SIMD_INST_SPEC", > + "PublicDescription": "This event counts speculatively executed o= perations that are SIMD or SVE vector operations or Advanced SIMD non-scala= r operations." > + }, > + { > + "ArchStdEvent": "ASE_INST_SPEC", > + "PublicDescription": "This event counts speculatively executed A= dvanced SIMD operations." > + }, > + { > + "ArchStdEvent": "SVE_INST_SPEC", > + "PublicDescription": "This event counts speculatively executed o= perations that are SVE operations." > + }, > + { > + "ArchStdEvent": "INT_SPEC", > + "PublicDescription": "This event counts speculatively executed i= nteger arithmetic operations." > + }, > + { > + "ArchStdEvent": "SVE_PRED_SPEC", > + "PublicDescription": "This event counts speculatively executed p= redicated SVE operations.\nThis counter also counts SVE operation due to in= struction with Governing predicate operand that determines the Active eleme= nts that do not write to any SVE Z vector destination register using either= zeroing or merging predicate. Thus, the operations due to instructions suc= h as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by t= he SVE_PRED_* events." > + }, > + { > + "ArchStdEvent": "SVE_PRED_EMPTY_SPEC", > + "PublicDescription": "This event counts speculatively executed p= redicated SVE operations with no active predicate elements.\nThis counter a= lso counts SVE operation due to instruction with Governing predicate operan= d that determines the Active elements that do not write to any SVE Z vector= destination register using either zeroing or merging predicate. Thus, the = operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, = SQDECP and PNEXT, are counted by the SVE_PRED_* events." > + }, > + { > + "ArchStdEvent": "SVE_PRED_FULL_SPEC", > + "PublicDescription": "This event counts speculatively executed p= redicated SVE operations with all predicate elements active.\nThis counter = also counts SVE operation due to instruction with Governing predicate opera= nd that determines the Active elements that do not write to any SVE Z vecto= r destination register using either zeroing or merging predicate. Thus, the= operations due to instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP,= SQDECP and PNEXT, are counted by the SVE_PRED_* events." > + }, > + { > + "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC", > + "PublicDescription": "This event counts speculatively executed p= redicated SVE operations with at least one but not all active predicate ele= ments.\nThis counter also counts SVE operation due to instruction with Gove= rning predicate operand that determines the Active elements that do not wri= te to any SVE Z vector destination register using either zeroing or merging= predicate. Thus, the operations due to instructions such as INCP, DECP, UQ= INCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* event= s." > + }, > + { > + "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC", > + "PublicDescription": "This event counts speculatively executed p= redicated SVE operations with at least one non active predicate elements.\n= This counter also counts SVE operation due to instruction with Governing pr= edicate operand that determines the Active elements that do not write to an= y SVE Z vector destination register using either zeroing or merging predica= te. Thus, the operations due to instructions such as INCP, DECP, UQINCP, UQ= DECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* events." > + }, > + { > + "ArchStdEvent": "PRF_SPEC", > + "PublicDescription": "This event counts speculatively executed o= perations that prefetch memory. For example, Scalar: PRFM, SVE: PRFB, PRFD,= PRFH, or PRFW." > + }, > + { > + "ArchStdEvent": "SVE_LDFF_SPEC", > + "PublicDescription": "This event counts speculatively executed S= VE first fault or non-fault Load operations." > + }, > + { > + "ArchStdEvent": "SVE_LDFF_FAULT_SPEC", > + "PublicDescription": "This event counts speculatively executed S= VE first fault or non-fault Load operations that clear at least one bit in = the FFR." > + }, > + { > + "ArchStdEvent": "ASE_SVE_INT8_SPEC", > + "PublicDescription": "This event counts speculatively executed A= dvanced SIMD or SVE integer operations with the largest data type being an = 8-bit integer." > + }, > + { > + "ArchStdEvent": "ASE_SVE_INT16_SPEC", > + "PublicDescription": "This event counts speculatively executed A= dvanced SIMD or SVE integer operations with the largest data type a 16-bit = integer." > + }, > + { > + "ArchStdEvent": "ASE_SVE_INT32_SPEC", > + "PublicDescription": "This event counts speculatively executed A= dvanced SIMD or SVE integer operations with the largest data type a 32-bit = integer." > + }, > + { > + "ArchStdEvent": "ASE_SVE_INT64_SPEC", > + "PublicDescription": "This event counts speculatively executed A= dvanced SIMD or SVE integer operations with the largest data type a 64-bit = integer." > + }, > + { > + "EventCode": "0x011d", > + "EventName": "SPEC_RET_STACK_FULL", > + "PublicDescription": "This event counts predict pipe stalls due = to speculative return address predictor full." > + }, > + { > + "EventCode": "0x011f", > + "EventName": "MOPS_SPEC", > + "PublicDescription": "Macro-ops speculatively decoded." > + }, > + { > + "EventCode": "0x0180", > + "EventName": "BR_SPEC_PRED_TAKEN", > + "PublicDescription": "Number of predicted taken from branch pred= ictor." > + }, > + { > + "EventCode": "0x0181", > + "EventName": "BR_SPEC_PRED_TAKEN_FROM_L2BTB", > + "PublicDescription": "Number of predicted taken branch from L2 B= TB." > + }, > + { > + "EventCode": "0x0182", > + "EventName": "BR_SPEC_PRED_TAKEN_MULTI", > + "PublicDescription": "Number of predicted taken for polymorphic = branch." > + }, > + { > + "EventCode": "0x0185", > + "EventName": "BR_SPEC_PRED_STATIC", > + "PublicDescription": "Number of post fetch prediction." > + }, > + { > + "EventCode": "0x01d0", > + "EventName": "TLBI_LOCAL_SPEC", > + "PublicDescription": "A non-broadcast TLBI instruction executed = (Speculatively or otherwise) on *this* PE." > + }, > + { > + "EventCode": "0x01d1", > + "EventName": "TLBI_BROADCAST_SPEC", > + "PublicDescription": "A broadcast TLBI instruction executed (Spe= culatively or otherwise) on *this* PE." > + }, > + { > + "EventCode": "0x01e7", > + "EventName": "BR_SPEC_PRED_ALN_REDIR", > + "PublicDescription": "BPU predict pipe align redirect (either AL= -APQ hit/miss)." > + }, > + { > + "EventCode": "0x0200", > + "EventName": "SIMD_CRYPTO_INST_SPEC", > + "PublicDescription": "SIMD, SVE, and CRYPTO instructions specula= tively decoded." > + }, > + { > + "EventCode": "0x022e", > + "EventName": "VPRED_LD_SPEC", > + "PublicDescription": "This event counts the number of Speculativ= ely-executed-Load operations with addresses produced by the value-predictio= n mechanism. The loaded data might be discarded if the predicted address di= ffers from the actual address." > + }, > + { > + "EventCode": "0x022f", > + "EventName": "VPRED_LD_SPEC_MISMATCH", > + "PublicDescription": "This event counts a subset of VPRED_LD_SPE= C where the predicted Load address and the actual address mismatched." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json b/to= ols/perf/pmu-events/arch/arm64/nvidia/t410/stall.json > new file mode 100644 > index 000000000000..92d9e0866c24 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json > @@ -0,0 +1,145 @@ > +[ > + { > + "ArchStdEvent": "STALL_FRONTEND", > + "PublicDescription": "This event counts cycles when frontend cou= ld not send any micro-operations to the rename stage because of frontend re= source stalls caused by fetch memory latency or branch prediction flow stal= ls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event coun= ts. STALL_SLOT_FRONTEND will count SLOTS when this event is counted on this= CPU." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND", > + "PublicDescription": "This event counts cycles whenever the rena= me unit is unable to send any micro-operations to the backend of the pipeli= ne because of backend resource constraints. Backend resource constraints ca= n include issue stage fullness, execution stage fullness, or other internal= pipeline resource fullness. All the backend slots were empty during the cy= cle when this event counts." > + }, > + { > + "ArchStdEvent": "STALL", > + "PublicDescription": "This event counts cycles when no operation= s are sent to the rename unit from the frontend or from the rename unit to = the backend for any reason (either frontend or backend stall). This event i= s the sum of the following events:\nSTALL_FRONTEND and\nSTALL_BACKEND." > + }, > + { > + "ArchStdEvent": "STALL_SLOT_BACKEND", > + "PublicDescription": "This event counts slots per cycle in which= no operations are sent from the rename unit to the backend due to backend = resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT= _BACKEND counts at least 1. STALL_BACKEND counts during the cycle when STAL= L_SLOT_BACKEND is SLOTS." > + }, > + { > + "ArchStdEvent": "STALL_SLOT_FRONTEND", > + "PublicDescription": "This event counts slots per cycle in which= no operations are sent to the rename unit from the frontend due to fronten= d resource constraints. STALL_FRONTEND counts during the cycle when STALL_S= LOT_FRONTEND is SLOTS." > + }, > + { > + "ArchStdEvent": "STALL_SLOT", > + "PublicDescription": "This event counts slots per cycle in which= no operations are sent to the rename unit from the frontend or from the re= name unit to the backend for any reason (either frontend or backend stall).= \nSTALL_SLOT is the sum of the following events:\nSTALL_SLOT_FRONTEND and\n= STALL_SLOT_BACKEND." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_MEM", > + "PublicDescription": "This event counts cycles when the backend = is stalled because there is a pending demand Load request in progress in th= e last level Core cache.\nLast level cache in this CPU is Level 2, hence th= is event counts same as STALL_BACKEND_L2D." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_MEMBOUND", > + "PublicDescription": "This event counts cycles when the frontend= could not send any micro-operations to the rename stage due to resource co= nstraints in the memory resources." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_L1I", > + "PublicDescription": "This event counts cycles when the frontend= is stalled because there is an instruction fetch request pending in the L1= I-cache." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_MEM", > + "PublicDescription": "This event counts cycles when the frontend= is stalled because there is an instruction fetch request pending in the la= st level Core cache.\nLast level cache in this CPU is Level 2, hence this e= vent counts rather than STALL_FRONTEND_L2I." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_TLB", > + "PublicDescription": "This event counts when the frontend is sta= lled on any TLB misses being handled. This event also counts the TLB access= es made by hardware prefetches." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_CPUBOUND", > + "PublicDescription": "This event counts cycles when the frontend= could not send any micro-operations to the rename stage due to resource co= nstraints in the CPU resources excluding memory resources." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_FLOW", > + "PublicDescription": "This event counts cycles when the frontend= could not send any micro-operations to the rename stage due to resource co= nstraints in the branch prediction unit." > + }, > + { > + "ArchStdEvent": "STALL_FRONTEND_FLUSH", > + "PublicDescription": "This event counts cycles when the frontend= could not send any micro-operations to the rename stage as the frontend is= recovering from a machine flush or resteer. Example scenarios that cause a= flush include branch mispredictions, taken exceptions, microarchitectural = flush etc." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_MEMBOUND", > + "PublicDescription": "This event counts cycles when the backend = could not accept any micro-operations due to resource constraints in the me= mory resources." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_L1D", > + "PublicDescription": "This event counts cycles when the backend = is stalled because there is a pending demand Load request in progress in th= e L1 D-cache." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_TLB", > + "PublicDescription": "This event counts cycles when the backend = is stalled on any demand TLB misses being handled." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_ST", > + "PublicDescription": "This event counts cycles when the backend = is stalled and there is a Store that has not reached the pre-commit stage." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_CPUBOUND", > + "PublicDescription": "This event counts cycles when the backend = could not accept any micro-operations due to any resource constraints in th= e CPU excluding memory resources." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_BUSY", > + "PublicDescription": "This event counts cycles when the backend = could not accept any micro-operations because the issue queues are full to = take any operations for execution." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_ILOCK", > + "PublicDescription": "This event counts cycles when the backend = could not accept any micro-operations due to resource constraints imposed b= y input dependency." > + }, > + { > + "ArchStdEvent": "STALL_BACKEND_RENAME", > + "PublicDescription": "This event counts cycles when backend is s= talled even when operations are available from the frontend but at least on= e is not ready to be sent to the backend because no rename register is avai= lable." > + }, > + { > + "EventCode": "0x0158", > + "EventName": "FLAG_DISP_STALL", > + "PublicDescription": "Rename stalled due to FRF(Flag register fi= le) full." > + }, > + { > + "EventCode": "0x0159", > + "EventName": "GEN_DISP_STALL", > + "PublicDescription": "Rename stalled due to GRF (General-purpose= register file) full." > + }, > + { > + "EventCode": "0x015a", > + "EventName": "VEC_DISP_STALL", > + "PublicDescription": "Rename stalled due to VRF (Vector register= file) full." > + }, > + { > + "EventCode": "0x015c", > + "EventName": "SX_IQ_STALL", > + "PublicDescription": "Dispatch stalled due to IQ full, SX." > + }, > + { > + "EventCode": "0x015d", > + "EventName": "MX_IQ_STALL", > + "PublicDescription": "Dispatch stalled due to IQ full, MX." > + }, > + { > + "EventCode": "0x015e", > + "EventName": "LS_IQ_STALL", > + "PublicDescription": "Dispatch stalled due to IQ full, LS." > + }, > + { > + "EventCode": "0x015f", > + "EventName": "VX_IQ_STALL", > + "PublicDescription": "Dispatch stalled due to IQ full, VX." > + }, > + { > + "EventCode": "0x0160", > + "EventName": "MCQ_FULL_STALL", > + "PublicDescription": "Dispatch stalled due to MCQ full." > + }, > + { > + "EventCode": "0x01cf", > + "EventName": "PRD_DISP_STALL", > + "PublicDescription": "Rename stalled due to predicate registers = (physical) are full." > + }, > + { > + "EventCode": "0x01e0", > + "EventName": "CSDB_STALL", > + "PublicDescription": "Rename stalled due to CSDB." > + }, > + { > + "EventCode": "0x01e2", > + "EventName": "STALL_SLOT_FRONTEND_WITHOUT_MISPRED", > + "PublicDescription": "Stall slot frontend during non-mispredicte= d branch.\nThis event counts the STALL_STOT_FRONTEND Events, except for the= 4 cycles following a mispredicted branch Event or 4 cycles following a com= mit flush&restart Event." > + } > +] > diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json b/tool= s/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json > new file mode 100644 > index 000000000000..18ec5c348c87 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json > @@ -0,0 +1,158 @@ > +[ > + { > + "ArchStdEvent": "L1I_TLB_REFILL", > + "PublicDescription": "This event counts L1 Instruction TLB refil= ls from any instruction fetch (demand, hardware prefetch, and software prel= oad accesses). If there are multiple misses in the TLB that are resolved by= the refill, then this event only counts once. This event will not count if= the translation table walk results in a fault (such as a translation or ac= cess fault), since there is no new translation created for the TLB." > + }, > + { > + "ArchStdEvent": "L1D_TLB_REFILL", > + "PublicDescription": "This event counts L1 Data TLB accesses tha= t resulted in TLB refills. If there are multiple misses in the TLB that are= resolved by the refill, then this event only counts once. This event count= s for refills caused by preload instructions or hardware prefetch accesses.= This event counts regardless of whether the miss hits in L2 or results in = a translation table walk. This event will not count if the translation tabl= e walk results in a fault (such as a translation or access fault), since th= ere is no new translation created for the TLB. This event will not count on= an access from an AT (Address Translation) instruction.\nThis event counts= the sum of the following events:\nL1D_TLB_REFILL_RD and\nL1D_TLB_REFILL_WR= =2E" > + }, > + { > + "ArchStdEvent": "L1D_TLB", > + "PublicDescription": "This event counts L1 Data TLB accesses cau= sed by any memory Load or Store operation.\nNote that Load or Store instruc= tions can be broken up into multiple memory operations.\nThis event does no= t count TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "L1I_TLB", > + "PublicDescription": "This event counts L1 instruction TLB acces= ses (caused by demand or hardware prefetch or software preload accesses), w= hether the access hits or misses in the TLB. This event counts both demand = accesses and prefetch or preload generated accesses.\nThis event is a super= set of the L1I_TLB_REFILL event." > + }, > + { > + "ArchStdEvent": "L2D_TLB_REFILL", > + "PublicDescription": "This event counts L2 TLB refills caused by= memory operations from both data and instruction fetch, except for those c= aused by TLB maintenance operations and hardware prefetches.\nThis event is= the sum of the following events:\nL2D_TLB_REFILL_RD and\nL2D_TLB_REFILL_WR= =2E" > + }, > + { > + "ArchStdEvent": "L2D_TLB", > + "PublicDescription": "This event counts L2 TLB accesses except t= hose caused by TLB maintenance operations.\nThis event is the sum of the fo= llowing events:\nL2D_TLB_RD and\nL2D_TLB_WR." > + }, > + { > + "ArchStdEvent": "DTLB_WALK", > + "PublicDescription": "This event counts number of demand data tr= anslation table walks caused by a miss in the L2 TLB and performing at leas= t one memory access. Translation table walks are counted even if the transl= ation ended up taking a translation fault for reasons different than EPD, E= 0PD and NFD. Note that partial translations that cause a translation table = walk are also counted. Also note that this event counts walks triggered by = software preloads, but not walks triggered by hardware prefetchers, and tha= t this event does not count walks triggered by TLB maintenance operations.\= nThis event does not include prefetches." > + }, > + { > + "ArchStdEvent": "ITLB_WALK", > + "PublicDescription": "This event counts number of instruction tr= anslation table walks caused by a miss in the L2 TLB and performing at leas= t one memory access. Translation table walks are counted even if the transl= ation ended up taking a translation fault for reasons different than EPD, E= 0PD and NFD. Note that partial translations that cause a translation table = walk are also counted. Also note that this event does not count walks trigg= ered by TLB maintenance operations.\nThis event does not include prefetches= =2E" > + }, > + { > + "ArchStdEvent": "L1D_TLB_REFILL_RD", > + "PublicDescription": "This event counts L1 Data TLB refills caus= ed by memory Read operations. If there are multiple misses in the TLB that = are resolved by the refill, then this event only counts once. This event co= unts for refills caused by preload instructions or hardware prefetch access= es. This event counts regardless of whether the miss hits in L2 or results = in a translation table walk. This event will not count if the translation t= able walk results in a fault (such as a translation or access fault), since= there is no new translation created for the TLB. This event will not count= on an access from an Address Translation (AT) instruction.\nThis event is = a subset of the L1D_TLB_REFILL event." > + }, > + { > + "ArchStdEvent": "L1D_TLB_REFILL_WR", > + "PublicDescription": "This event counts L1 Data TLB refills caus= ed by data side memory Write operations. If there are multiple misses in th= e TLB that are resolved by the refill, then this event only counts once. Th= is event counts for refills caused by preload instructions or hardware pref= etch accesses. This event counts regardless of whether the miss hits in L2 = or results in a translation table walk. This event will not count if the ta= ble walk results in a fault (such as a translation or access fault), since = there is no new translation created for the TLB. This event will not count = with an access from an Address Translation (AT) instruction.\nThis event is= a subset of the L1D_TLB_REFILL event." > + }, > + { > + "ArchStdEvent": "L1D_TLB_RD", > + "PublicDescription": "This event counts L1 Data TLB accesses cau= sed by memory Read operations. This event counts whether the access hits or= misses in the TLB. This event does not count TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "L1D_TLB_WR", > + "PublicDescription": "This event counts any L1 Data side TLB acc= esses caused by memory Write operations. This event counts whether the acce= ss hits or misses in the TLB. This event does not count TLB maintenance ope= rations." > + }, > + { > + "ArchStdEvent": "L2D_TLB_REFILL_RD", > + "PublicDescription": "This event counts L2 TLB refills caused by= memory Read operations from both data and instruction fetch except for tho= se caused by TLB maintenance operations or hardware prefetches.\nThis event= is a subset of the L2D_TLB_REFILL event." > + }, > + { > + "ArchStdEvent": "L2D_TLB_REFILL_WR", > + "PublicDescription": "This event counts L2 TLB refills caused by= memory Write operations from both data and instruction fetch except for th= ose caused by TLB maintenance operations.\nThis event is a subset of the L2= D_TLB_REFILL event." > + }, > + { > + "ArchStdEvent": "L2D_TLB_RD", > + "PublicDescription": "This event counts L2 TLB accesses caused b= y memory Read operations from both data and instruction fetch except for th= ose caused by TLB maintenance operations.\nThis event is a subset of the L2= D_TLB event." > + }, > + { > + "ArchStdEvent": "L2D_TLB_WR", > + "PublicDescription": "This event counts L2 TLB accesses caused b= y memory Write operations from both data and instruction fetch except for t= hose caused by TLB maintenance operations.\nThis event is a subset of the L= 2D_TLB event." > + }, > + { > + "ArchStdEvent": "DTLB_WALK_PERCYC", > + "PublicDescription": "This event counts the number of data trans= lation table walks in progress per cycle." > + }, > + { > + "ArchStdEvent": "ITLB_WALK_PERCYC", > + "PublicDescription": "This event counts the number of instructio= n translation table walks in progress per cycle." > + }, > + { > + "ArchStdEvent": "L1D_TLB_RW", > + "PublicDescription": "This event counts L1 Data TLB demand acces= ses caused by memory Read or Write operations. This event counts whether th= e access hits or misses in the TLB. This event does not count TLB maintenan= ce operations." > + }, > + { > + "ArchStdEvent": "L1I_TLB_RD", > + "PublicDescription": "This event counts L1 Instruction TLB deman= d accesses whether the access hits or misses in the TLB." > + }, > + { > + "ArchStdEvent": "L1D_TLB_PRFM", > + "PublicDescription": "This event counts L1 Data TLB accesses gen= erated by software prefetch or preload memory accesses. Load or Store instr= uctions can be broken into multiple memory operations. This event does not = count TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "L1I_TLB_PRFM", > + "PublicDescription": "This event counts L1 Instruction TLB acces= ses generated by software preload or prefetch instructions. This event coun= ts whether the access hits or misses in the TLB. This event does not count = TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "DTLB_HWUPD", > + "PublicDescription": "This event counts number of memory accesse= s triggered by a data translation table walk and performing an update of a = translation table entry. Memory accesses are counted even if the translatio= n ended up taking a translation fault for reasons different than EPD, E0PD = and NFD. Note that this event counts accesses triggered by software preload= s, but not accesses triggered by hardware prefetchers." > + }, > + { > + "ArchStdEvent": "ITLB_HWUPD", > + "PublicDescription": "This event counts number of memory accesse= s triggered by an instruction translation table walk and performing an upda= te of a translation table entry. Memory accesses are counted even if the tr= anslation ended up taking a translation fault for reasons different than EP= D, E0PD and NFD." > + }, > + { > + "ArchStdEvent": "DTLB_STEP", > + "PublicDescription": "This event counts number of memory accesse= s triggered by a demand data translation table walk and performing a Read o= f a translation table entry. Memory accesses are counted even if the transl= ation ended up taking a translation fault for reasons different than EPD, E= 0PD and NFD.\nNote that this event counts accesses triggered by software pr= eloads, but not accesses triggered by hardware prefetchers." > + }, > + { > + "ArchStdEvent": "ITLB_STEP", > + "PublicDescription": "This event counts number of memory accesse= s triggered by an instruction translation table walk and performing a Read = of a translation table entry. Memory accesses are counted even if the trans= lation ended up taking a translation fault for reasons different than EPD, = E0PD and NFD." > + }, > + { > + "ArchStdEvent": "DTLB_WALK_LARGE", > + "PublicDescription": "This event counts number of demand data tr= anslation table walks caused by a miss in the L2 TLB and yielding a large p= age. The set of large pages is defined as all pages with a final size highe= r than or equal to 2MB. Translation table walks that end up taking a transl= ation fault are not counted, as the page size would be undefined in that ca= se. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event i= n this family.\nNote that partial translations that cause a translation tab= le walk are also counted.\nAlso note that this event counts walks triggered= by software preloads, but not walks triggered by hardware prefetchers, and= that this event does not count walks triggered by TLB maintenance operatio= ns." > + }, > + { > + "ArchStdEvent": "ITLB_WALK_LARGE", > + "PublicDescription": "This event counts number of instruction tr= anslation table walks caused by a miss in the L2 TLB and yielding a large p= age. The set of large pages is defined as all pages with a final size highe= r than or equal to 2MB. Translation table walks that end up taking a transl= ation fault are not counted, as the page size would be undefined in that ca= se. In this family, this is equal to ITLB_WALK_BLOCK event.\nNote that part= ial translations that cause a translation table walk are also counted.\nAls= o note that this event does not count walks triggered by TLB maintenance op= erations." > + }, > + { > + "ArchStdEvent": "DTLB_WALK_SMALL", > + "PublicDescription": "This event counts number of data translati= on table walks caused by a miss in the L2 TLB and yielding a small page. Th= e set of small pages is defined as all pages with a final size lower than 2= MB. Translation table walks that end up taking a translation fault are not = counted, as the page size would be undefined in that case. If DTLB_WALK_PAG= E event is implemented, then it is an alias for this event in this family. = Note that partial translations that cause a translation table walk are also= counted.\nAlso note that this event counts walks triggered by software pre= loads, but not walks triggered by hardware prefetchers, and that this event= does not count walks triggered by TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "ITLB_WALK_SMALL", > + "PublicDescription": "This event counts number of instruction tr= anslation table walks caused by a miss in the L2 TLB and yielding a small p= age. The set of small pages is defined as all pages with a final size lower= than 2MB. Translation table walks that end up taking a translation fault a= re not counted, as the page size would be undefined in that case. In this f= amily, this is equal to ITLB_WALK_PAGE event.\nNote that partial translatio= ns that cause a translation table walk are also counted.\nAlso note that th= is event does not count walks triggered by TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "DTLB_WALK_RW", > + "PublicDescription": "This event counts number of demand data tr= anslation table walks caused by a miss in the L2 TLB and performing at leas= t one memory access. Translation table walks are counted even if the transl= ation ended up taking a translation fault for reasons different than EPD, E= 0PD and NFD.\nNote that partial translations that cause a translation table= walk are also counted.\nAlso note that this event does not count walks tri= ggered by TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "ITLB_WALK_RD", > + "PublicDescription": "This event counts number of demand instruc= tion translation table walks caused by a miss in the L2 TLB and performing = at least one memory access. Translation table walks are counted even if the= translation ended up taking a translation fault for reasons different than= EPD, E0PD and NFD.\nNote that partial translations that cause a translatio= n table walk are also counted.\nAlso note that this event does not count wa= lks triggered by TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "DTLB_WALK_PRFM", > + "PublicDescription": "This event counts number of software prefe= tches or preloads generated data translation table walks caused by a miss i= n the L2 TLB and performing at least one memory access. Translation table w= alks are counted even if the translation ended up taking a translation faul= t for reasons different than EPD, E0PD and NFD.\nNote that partial translat= ions that cause a translation table walk are also counted.\nAlso note that = this event does not count walks triggered by TLB maintenance operations." > + }, > + { > + "ArchStdEvent": "ITLB_WALK_PRFM", > + "PublicDescription": "This event counts number of software prefe= tches or preloads generated instruction translation table walks caused by a= miss in the L2 TLB and performing at least one memory access. Translation = table walks are counted even if the translation ended up taking a translati= on fault for reasons different than EPD, E0PD and NFD.\nNote that partial t= ranslations that cause a translation table walk are also counted.\nAlso not= e that this event does not count walks triggered by TLB maintenance operati= ons." > + }, > + { > + "EventCode": "0x010e", > + "EventName": "L1D_TLB_REFILL_RD_PF", > + "PublicDescription": "L1 Data TLB refill, Read, prefetch." > + }, > + { > + "EventCode": "0x010f", > + "EventName": "L2TLB_PF_REFILL", > + "PublicDescription": "L2 Data TLB refill, Read, prefetch.\nThis = event counts MMU refills due to internal PFStream requests." > + }, > + { > + "EventCode": "0x0223", > + "EventName": "L1I_TLB_REFILL_RD", > + "PublicDescription": "L1 Instruction TLB refills due to Demand m= iss." > + }, > + { > + "EventCode": "0x0224", > + "EventName": "L1I_TLB_REFILL_PRFM", > + "PublicDescription": "L1 Instruction TLB refills due to Software= prefetch miss." > + } > +] > --=20 > 2.43.0 >=20