public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support
@ 2026-01-26 18:11 Besar Wicaksono
  2026-01-26 18:11 ` [PATCH 1/8] perf/arm_cspmu: nvidia: Rename doc to Tegra241 Besar Wicaksono
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

This series adds driver support for the following Uncore PMUs in
NVIDIA Tegra410 SoC:
  - Unified Coherence Fabric (UCF)
  - PCIE
  - PCIE-TGT
  - CPU Memory (CMEM) Latency
  - NVLink-C2C
  - NV-CLink
  - NV-DLink

Thanks,
Besar

Besar Wicaksono (8):
  perf/arm_cspmu: nvidia: Rename doc to Tegra241
  perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
  perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
  perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
  perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
  perf: add NVIDIA Tegra410 CPU Memory Latency PMU
  perf: add NVIDIA Tegra410 C2C PMU
  arm64: defconfig: Enable NVIDIA TEGRA410 PMU

 Documentation/admin-guide/perf/index.rst      |    3 +-
 ...nvidia-pmu.rst => nvidia-tegra241-pmu.rst} |    8 +-
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  520 ++++++++
 arch/arm64/configs/defconfig                  |    2 +
 drivers/perf/Kconfig                          |   14 +
 drivers/perf/Makefile                         |    2 +
 drivers/perf/arm_cspmu/arm_cspmu.c            |   24 +-
 drivers/perf/arm_cspmu/arm_cspmu.h            |   17 +-
 drivers/perf/arm_cspmu/nvidia_cspmu.c         |  622 +++++++++-
 drivers/perf/nvidia_t410_c2c_pmu.c            | 1061 +++++++++++++++++
 drivers/perf/nvidia_t410_cmem_latency_pmu.c   |  727 +++++++++++
 11 files changed, 2990 insertions(+), 10 deletions(-)
 rename Documentation/admin-guide/perf/{nvidia-pmu.rst => nvidia-tegra241-pmu.rst} (98%)
 create mode 100644 Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
 create mode 100644 drivers/perf/nvidia_t410_c2c_pmu.c
 create mode 100644 drivers/perf/nvidia_t410_cmem_latency_pmu.c

-- 
2.43.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/8] perf/arm_cspmu: nvidia: Rename doc to Tegra241
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-26 18:11 ` [PATCH 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU Besar Wicaksono
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

The documentation in nvidia-pmu.rst contains PMUs specific
to NVIDIA Tegra241 SoC. Rename the file for this specific
SoC to have better distinction with other NVIDIA SoC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 Documentation/admin-guide/perf/index.rst                  | 2 +-
 .../perf/{nvidia-pmu.rst => nvidia-tegra241-pmu.rst}      | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)
 rename Documentation/admin-guide/perf/{nvidia-pmu.rst => nvidia-tegra241-pmu.rst} (98%)

diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 47d9a3df6329..c407bb44b08e 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -24,7 +24,7 @@ Performance monitor support
    thunderx2-pmu
    alibaba_pmu
    dwc_pcie_pmu
-   nvidia-pmu
+   nvidia-tegra241-pmu
    meson-ddr-pmu
    cxl
    ampere_cspmu
diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra241-pmu.rst
similarity index 98%
rename from Documentation/admin-guide/perf/nvidia-pmu.rst
rename to Documentation/admin-guide/perf/nvidia-tegra241-pmu.rst
index f538ef67e0e8..fad5bc4cee6c 100644
--- a/Documentation/admin-guide/perf/nvidia-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra241-pmu.rst
@@ -1,8 +1,8 @@
-=========================================================
-NVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU)
-=========================================================
+============================================================
+NVIDIA Tegra241 SoC Uncore Performance Monitoring Unit (PMU)
+============================================================
 
-The NVIDIA Tegra SoC includes various system PMUs to measure key performance
+The NVIDIA Tegra241 SoC includes various system PMUs to measure key performance
 metrics like memory bandwidth, latency, and utilization:
 
 * Scalable Coherency Fabric (SCF)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
  2026-01-26 18:11 ` [PATCH 1/8] perf/arm_cspmu: nvidia: Rename doc to Tegra241 Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-29 22:20   ` Ilkka Koskinen
  2026-01-26 18:11 ` [PATCH 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get Besar Wicaksono
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Adds Unified Coherent Fabric PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 Documentation/admin-guide/perf/index.rst      |   1 +
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  | 106 ++++++++++++++++++
 drivers/perf/arm_cspmu/nvidia_cspmu.c         |  90 ++++++++++++++-
 3 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst

diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index c407bb44b08e..aa12708ddb96 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -25,6 +25,7 @@ Performance monitor support
    alibaba_pmu
    dwc_pcie_pmu
    nvidia-tegra241-pmu
+   nvidia-tegra410-pmu
    meson-ddr-pmu
    cxl
    ampere_cspmu
diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
new file mode 100644
index 000000000000..7b7ba5700ca1
--- /dev/null
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -0,0 +1,106 @@
+=====================================================================
+NVIDIA Tegra410 SoC Uncore Performance Monitoring Unit (PMU)
+=====================================================================
+
+The NVIDIA Tegra410 SoC includes various system PMUs to measure key performance
+metrics like memory bandwidth, latency, and utilization:
+
+* Unified Coherence Fabric (UCF)
+
+PMU Driver
+----------
+
+The PMU driver describes the available events and configuration of each PMU in
+sysfs. Please see the sections below to get the sysfs path of each PMU. Like
+other uncore PMU drivers, the driver provides "cpumask" sysfs attribute to show
+the CPU id used to handle the PMU event. There is also "associated_cpus"
+sysfs attribute, which contains a list of CPUs associated with the PMU instance.
+
+UCF PMU
+-------
+
+The Unified Coherence Fabric (UCF) in the NVIDIA Tegra410 SoC serves as a
+distributed cache, last level for CPU Memory and CXL Memory, and cache coherent
+interconnect that supports hardware coherence across multiple coherently caching
+agents, including:
+
+  * CPU clusters
+  * GPU
+  * PCIe Ordering Controller Unit (OCU)
+  * Other IO-coherent requesters
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_source/devices/nvidia_ucf_pmu_<socket-id>.
+
+Some of the events available in this PMU can be used to measure bandwidth and
+utilization:
+
+  * slc_access_rd: count the number of read requests to SLC.
+  * slc_access_wr: count the number of write requests to SLC.
+  * slc_bytes_rd: count the number of bytes transferred by slc_access_rd.
+  * slc_bytes_wr: count the number of bytes transferred by slc_access_wr.
+  * mem_access_rd: count the number of read requests to local or remote memory.
+  * mem_access_wr: count the number of write requests to local or remote memory.
+  * mem_bytes_rd: count the number of bytes transferred by mem_access_rd.
+  * mem_bytes_wr: count the number of bytes transferred by mem_access_wr.
+  * cycles: counts the UCF cycles.
+
+The average bandwidth is calculated as::
+
+   AVG_SLC_READ_BANDWIDTH_IN_GBPS = SLC_BYTES_RD / ELAPSED_TIME_IN_NS
+   AVG_SLC_WRITE_BANDWIDTH_IN_GBPS = SLC_BYTES_WR / ELAPSED_TIME_IN_NS
+   AVG_MEM_READ_BANDWIDTH_IN_GBPS = MEM_BYTES_RD / ELAPSED_TIME_IN_NS
+   AVG_MEM_WRITE_BANDWIDTH_IN_GBPS = MEM_BYTES_WR / ELAPSED_TIME_IN_NS
+
+The average request rate is calculated as::
+
+   AVG_SLC_READ_REQUEST_RATE = SLC_ACCESS_RD / CYCLES
+   AVG_SLC_WRITE_REQUEST_RATE = SLC_ACCESS_WR / CYCLES
+   AVG_MEM_READ_REQUEST_RATE = MEM_ACCESS_RD / CYCLES
+   AVG_MEM_WRITE_REQUEST_RATE = MEM_ACCESS_WR / CYCLES
+
+More details about what other events are available can be found in Tegra410 SoC
+technical reference manual.
+
+The events can be filtered based on source or destination. The source filter
+indicates the traffic initiator to the SLC, e.g local CPU, non-CPU device, or
+remote socket. The destination filter specifies the destination memory type,
+e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory. The
+local/remote classification of the destination filter is based on the home
+socket of the address, not where the data actually resides. The available
+filters are described in
+/sys/bus/event_source/devices/nvidia_ucf_pmu_<socket-id>/format/.
+
+The list of UCF PMU event filters:
+
+* Source filter:
+
+  * src_loc_cpu: if set, count events from local CPU
+  * src_loc_noncpu: if set, count events from local non-CPU device
+  * src_rem: if set, count events from CPU, GPU, PCIE devices of remote socket
+
+* Destination filter:
+
+  * dst_loc_cmem: if set, count events to local system memory (CMEM) address
+  * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address
+  * dst_loc_other: if set, count events to local CXL memory address
+  * dst_rem: if set, count events to CPU, GPU, and CXL memory address of remote socket
+
+If the source is not specified, the PMU will count events from all sources. If
+the destination is not specified, the PMU will count events to all destinations.
+
+Example usage:
+
+* Count event id 0x0 in socket 0 from all sources and to all destinations::
+
+    perf stat -a -e nvidia_ucf_pmu_0/event=0x0/
+
+* Count event id 0x0 in socket 0 with source filter = local CPU and destination
+  filter = local system memory (CMEM)::
+
+    perf stat -a -e nvidia_ucf_pmu_0/event=0x0,src_loc_cpu=0x1,dst_loc_cmem=0x1/
+
+* Count event id 0x0 in socket 1 with source filter = local non-CPU device and
+  destination filter = remote memory::
+
+    perf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/
diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
index e06a06d3407b..c67667097a3c 100644
--- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
+++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  *
  */
 
@@ -21,6 +21,13 @@
 #define NV_CNVL_PORT_COUNT           4ULL
 #define NV_CNVL_FILTER_ID_MASK       GENMASK_ULL(NV_CNVL_PORT_COUNT - 1, 0)
 
+#define NV_UCF_SRC_COUNT             3ULL
+#define NV_UCF_DST_COUNT             4ULL
+#define NV_UCF_FILTER_ID_MASK        GENMASK_ULL(11, 0)
+#define NV_UCF_FILTER_SRC            GENMASK_ULL(2, 0)
+#define NV_UCF_FILTER_DST            GENMASK_ULL(11, 8)
+#define NV_UCF_FILTER_DEFAULT        (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DST)
+
 #define NV_GENERIC_FILTER_ID_MASK    GENMASK_ULL(31, 0)
 
 #define NV_PRODID_MASK	(PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISION)
@@ -124,6 +131,37 @@ static struct attribute *mcf_pmu_event_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ucf_pmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(bus_cycles,            0x1D),
+
+	ARM_CSPMU_EVENT_ATTR(slc_allocate,          0xF0),
+	ARM_CSPMU_EVENT_ATTR(slc_wb,                0xF3),
+	ARM_CSPMU_EVENT_ATTR(slc_refill_rd,         0x109),
+	ARM_CSPMU_EVENT_ATTR(slc_refill_wr,         0x10A),
+	ARM_CSPMU_EVENT_ATTR(slc_hit_rd,            0x119),
+
+	ARM_CSPMU_EVENT_ATTR(slc_access_dataless,   0x183),
+	ARM_CSPMU_EVENT_ATTR(slc_access_atomic,     0x184),
+
+	ARM_CSPMU_EVENT_ATTR(slc_access,            0xF2),
+	ARM_CSPMU_EVENT_ATTR(slc_access_rd,         0x111),
+	ARM_CSPMU_EVENT_ATTR(slc_access_wr,         0x112),
+	ARM_CSPMU_EVENT_ATTR(slc_bytes_rd,          0x113),
+	ARM_CSPMU_EVENT_ATTR(slc_bytes_wr,          0x114),
+
+	ARM_CSPMU_EVENT_ATTR(mem_access_rd,         0x121),
+	ARM_CSPMU_EVENT_ATTR(mem_access_wr,         0x122),
+	ARM_CSPMU_EVENT_ATTR(mem_bytes_rd,          0x123),
+	ARM_CSPMU_EVENT_ATTR(mem_bytes_wr,          0x124),
+
+	ARM_CSPMU_EVENT_ATTR(local_snoop,           0x180),
+	ARM_CSPMU_EVENT_ATTR(ext_snp_access,        0x181),
+	ARM_CSPMU_EVENT_ATTR(ext_snp_evict,         0x182),
+
+	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
 static struct attribute *generic_pmu_event_attrs[] = {
 	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
 	NULL,
@@ -152,6 +190,18 @@ static struct attribute *cnvlink_pmu_format_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ucf_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_ATTR(src_loc_noncpu, "config1:0"),
+	ARM_CSPMU_FORMAT_ATTR(src_loc_cpu, "config1:1"),
+	ARM_CSPMU_FORMAT_ATTR(src_rem, "config1:2"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config1:8"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config1:9"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_other, "config1:10"),
+	ARM_CSPMU_FORMAT_ATTR(dst_rem, "config1:11"),
+	NULL,
+};
+
 static struct attribute *generic_pmu_format_attrs[] = {
 	ARM_CSPMU_FORMAT_EVENT_ATTR,
 	ARM_CSPMU_FORMAT_FILTER_ATTR,
@@ -236,6 +286,27 @@ static void nv_cspmu_set_cc_filter(struct arm_cspmu *cspmu,
 	writel(filter, cspmu->base0 + PMCCFILTR);
 }
 
+static u32 ucf_pmu_event_filter(const struct perf_event *event)
+{
+	u32 ret, filter, src, dst;
+
+	filter = nv_cspmu_event_filter(event);
+
+	/* Monitor all sources if none is selected. */
+	src = FIELD_GET(NV_UCF_FILTER_SRC, filter);
+	if (src == 0)
+		src = GENMASK_ULL(NV_UCF_SRC_COUNT - 1, 0);
+
+	/* Monitor all destinations if none is selected. */
+	dst = FIELD_GET(NV_UCF_FILTER_DST, filter);
+	if (dst == 0)
+		dst = GENMASK_ULL(NV_UCF_DST_COUNT - 1, 0);
+
+	ret = FIELD_PREP(NV_UCF_FILTER_SRC, src);
+	ret |= FIELD_PREP(NV_UCF_FILTER_DST, dst);
+
+	return ret;
+}
 
 enum nv_cspmu_name_fmt {
 	NAME_FMT_GENERIC,
@@ -342,6 +413,23 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
 		.init_data = NULL
 	  },
 	},
+	{
+	  .prodid = 0x2CF20000,
+	  .prodid_mask = NV_PRODID_MASK,
+	  .name_pattern = "nvidia_ucf_pmu_%u",
+	  .name_fmt = NAME_FMT_SOCKET,
+	  .template_ctx = {
+		.event_attr = ucf_pmu_event_attrs,
+		.format_attr = ucf_pmu_format_attrs,
+		.filter_mask = NV_UCF_FILTER_ID_MASK,
+		.filter_default_val = NV_UCF_FILTER_DEFAULT,
+		.filter2_mask = 0x0,
+		.filter2_default_val = 0x0,
+		.get_filter = ucf_pmu_event_filter,
+		.get_filter2 = NULL,
+		.init_data = NULL
+	  },
+	},
 	{
 	  .prodid = 0,
 	  .prodid_mask = 0,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
  2026-01-26 18:11 ` [PATCH 1/8] perf/arm_cspmu: nvidia: Rename doc to Tegra241 Besar Wicaksono
  2026-01-26 18:11 ` [PATCH 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-29 22:23   ` Ilkka Koskinen
  2026-01-26 18:11 ` [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Besar Wicaksono
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Add interface to get ACPI device associated with the
PMU. This ACPI device may contain additional properties
not covered by the standard properties.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 drivers/perf/arm_cspmu/arm_cspmu.c | 24 +++++++++++++++++++++++-
 drivers/perf/arm_cspmu/arm_cspmu.h | 17 ++++++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index 34430b68f602..dadc9b765d80 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -16,7 +16,7 @@
  * The user should refer to the vendor technical documentation to get details
  * about the supported events.
  *
- * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  *
  */
 
@@ -1132,6 +1132,28 @@ static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)
 
 	return 0;
 }
+
+struct acpi_device *arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu)
+{
+	char hid[16];
+	char uid[16];
+	struct acpi_device *adev;
+	const struct acpi_apmt_node *apmt_node;
+
+	apmt_node = arm_cspmu_apmt_node(cspmu->dev);
+	if (!apmt_node || apmt_node->type != ACPI_APMT_NODE_TYPE_ACPI)
+		return NULL;
+
+	memset(hid, 0, sizeof(hid));
+	memset(uid, 0, sizeof(uid));
+
+	memcpy(hid, &apmt_node->inst_primary, sizeof(apmt_node->inst_primary));
+	snprintf(uid, sizeof(uid), "%u", apmt_node->inst_secondary);
+
+	adev = acpi_dev_get_first_match_dev(hid, uid, -1);
+	return adev;
+}
+EXPORT_SYMBOL_GPL(arm_cspmu_acpi_dev_get);
 #else
 static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)
 {
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.h b/drivers/perf/arm_cspmu/arm_cspmu.h
index cd65a58dbd88..320096673200 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.h
+++ b/drivers/perf/arm_cspmu/arm_cspmu.h
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0
  *
  * ARM CoreSight Architecture PMU driver.
- * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  *
  */
 
 #ifndef __ARM_CSPMU_H__
 #define __ARM_CSPMU_H__
 
+#include <linux/acpi.h>
 #include <linux/bitfield.h>
 #include <linux/cpumask.h>
 #include <linux/device.h>
@@ -255,4 +256,18 @@ int arm_cspmu_impl_register(const struct arm_cspmu_impl_match *impl_match);
 /* Unregister vendor backend. */
 void arm_cspmu_impl_unregister(const struct arm_cspmu_impl_match *impl_match);
 
+#if defined(CONFIG_ACPI)
+/**
+ * Get ACPI device associated with the PMU.
+ * The caller is responsible for calling acpi_dev_put() on the returned device.
+ */
+struct acpi_device *arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu);
+#else
+static inline struct acpi_device *
+arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu)
+{
+	return NULL;
+}
+#endif
+
 #endif /* __ARM_CSPMU_H__ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
                   ` (2 preceding siblings ...)
  2026-01-26 18:11 ` [PATCH 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-28 15:56   ` kernel test robot
  2026-01-29 22:34   ` Ilkka Koskinen
  2026-01-26 18:11 ` [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU Besar Wicaksono
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Adds PCIE PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  | 162 ++++++++++++++
 drivers/perf/arm_cspmu/nvidia_cspmu.c         | 208 +++++++++++++++++-
 2 files changed, 368 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
index 7b7ba5700ca1..8528685ddb61 100644
--- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -6,6 +6,7 @@ The NVIDIA Tegra410 SoC includes various system PMUs to measure key performance
 metrics like memory bandwidth, latency, and utilization:
 
 * Unified Coherence Fabric (UCF)
+* PCIE
 
 PMU Driver
 ----------
@@ -104,3 +105,164 @@ Example usage:
   destination filter = remote memory::
 
     perf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/
+
+PCIE PMU
+--------
+
+This PMU monitors all read/write traffic from the root port(s) or a particular
+BDF in a PCIE root complex (RC) to local or remote memory. There is one PMU per
+PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated into
+up to 8 root ports. The traffic from each root port can be filtered using RP or
+BDF filter. For example, specifying "src_rp_mask=0xFF" means the PMU counter will
+capture traffic from all RPs. Please see below for more details.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>_rc_<pcie-rc-id>.
+
+The events in this PMU can be used to measure bandwidth, utilization, and
+latency:
+
+  * rd_req: count the number of read requests by PCIE device.
+  * wr_req: count the number of write requests by PCIE device.
+  * rd_bytes: count the number of bytes transferred by rd_req.
+  * wr_bytes: count the number of bytes transferred by wr_req.
+  * rd_cum_outs: count outstanding rd_req each cycle.
+  * cycles: counts the PCIE cycles.
+
+The average bandwidth is calculated as::
+
+   AVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS
+   AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NS
+
+The average request rate is calculated as::
+
+   AVG_RD_REQUEST_RATE = RD_REQ / CYCLES
+   AVG_WR_REQUEST_RATE = WR_REQ / CYCLES
+
+
+The average latency is calculated as::
+
+   FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
+   AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ
+   AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ
+
+The PMU events can be filtered based on the traffic source and destination.
+The source filter indicates the PCIE devices that will be monitored. The
+destination filter specifies the destination memory type, e.g. local system
+memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote
+classification of the destination filter is based on the home socket of the
+address, not where the data actually resides. These filters can be found in
+/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>_rc_<pcie-rc-id>/format/.
+
+The list of event filters:
+
+* Source filter:
+
+  * src_rp_mask: bitmask of root ports that will be monitored. Each bit in this
+    bitmask represents the RP index in the RC. If the bit is set, all devices under
+    the associated RP will be monitored. E.g "src_rp_mask=0xF" will monitor
+    devices in root port 0 to 3.
+  * src_bdf: the BDF that will be monitored. This is a 16-bit value that
+    follows formula: (bus << 8) + (device << 3) + (function). For example, the
+    value of BDF 27:01.1 is 0x2781.
+  * src_bdf_en: enable the BDF filter. If this is set, the BDF filter value in
+    "src_bdf" is used to filter the traffic.
+
+  Note that Root-Port and BDF filters are mutually exclusive and the PMU in
+  each RC can only have one BDF filter for the whole counters. If BDF filter
+  is enabled, the BDF filter value will be applied to all events.
+
+* Destination filter:
+
+  * dst_loc_cmem: if set, count events to local system memory (CMEM) address
+  * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address
+  * dst_loc_pcie_p2p: if set, count events to local PCIE peer address
+  * dst_loc_pcie_cxl: if set, count events to local CXL memory address
+  * dst_rem: if set, count events to remote memory address
+
+If the source filter is not specified, the PMU will count events from all root
+ports. If the destination filter is not specified, the PMU will count events
+to all destinations.
+
+Example usage:
+
+* Count event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting all
+  destinations::
+
+    perf stat -a -e nvidia_pcie_pmu_0_rc_0/event=0x0,src_rp_mask=0x1/
+
+* Count event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and
+  targeting just local CMEM of socket 0::
+
+    perf stat -a -e nvidia_pcie_pmu_0_rc_1/event=0x1,src_rp_mask=0x3,dst_loc_cmem=0x1/
+
+* Count event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting all
+  destinations::
+
+    perf stat -a -e nvidia_pcie_pmu_1_rc_2/event=0x2,src_rp_mask=0x1/
+
+* Count event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and
+  targeting just local CMEM of socket 1::
+
+    perf stat -a -e nvidia_pcie_pmu_1_rc_3/event=0x3,src_rp_mask=0x3,dst_loc_cmem=0x1/
+
+* Count event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting all
+  destinations::
+
+    perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/
+
+Mapping the RC# to lspci segment number can be non-trivial; hence a new NVIDIA
+Designated Vendor Specific Capability (DVSEC) register is added into the PCIE config space
+for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DVSEC register
+contains the following information to map PCIE devices under the RP back to its RC# :
+
+  - Bus# (byte 0xc) : bus number as reported by the lspci output
+  - Segment# (byte 0xd) : segment number as reported by the lspci output
+  - RP# (byte 0xe) : port number as reported by LnkCap attribute from lspci for a device with Root Port capability
+  - RC# (byte 0xf): root complex number associated with the RP
+  - Socket# (byte 0x10): socket number associated with the RP
+
+Example script for mapping lspci BDF to RC# and socket#::
+
+  #!/bin/bash
+  while read bdf rest; do
+    dvsec4_reg=$(lspci -vv -s $bdf | awk '
+      /Designated Vendor-Specific: Vendor=10de ID=0004/ {
+        match($0, /\[([0-9a-fA-F]+)/, arr);
+        print "0x" arr[1];
+        exit
+      }
+    ')
+    if [ -n "$dvsec4_reg" ]; then
+      bus=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xc))).b)
+      segment=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xd))).b)
+      rp=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xe))).b)
+      rc=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xf))).b)
+      socket=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0x10))).b)
+      echo "$bdf: Bus=$bus, Segment=$segment, RP=$rp, RC=$rc, Socket=$socket"
+    fi
+  done < <(lspci -d 10de:)
+
+Example output::
+
+  0001:00:00.0: Bus=00, Segment=01, RP=00, RC=00, Socket=00
+  0002:80:00.0: Bus=80, Segment=02, RP=01, RC=01, Socket=00
+  0002:a0:00.0: Bus=a0, Segment=02, RP=02, RC=01, Socket=00
+  0002:c0:00.0: Bus=c0, Segment=02, RP=03, RC=01, Socket=00
+  0002:e0:00.0: Bus=e0, Segment=02, RP=04, RC=01, Socket=00
+  0003:00:00.0: Bus=00, Segment=03, RP=00, RC=02, Socket=00
+  0004:00:00.0: Bus=00, Segment=04, RP=00, RC=03, Socket=00
+  0005:00:00.0: Bus=00, Segment=05, RP=00, RC=04, Socket=00
+  0005:40:00.0: Bus=40, Segment=05, RP=01, RC=04, Socket=00
+  0005:c0:00.0: Bus=c0, Segment=05, RP=02, RC=04, Socket=00
+  0006:00:00.0: Bus=00, Segment=06, RP=00, RC=05, Socket=00
+  0009:00:00.0: Bus=00, Segment=09, RP=00, RC=00, Socket=01
+  000a:80:00.0: Bus=80, Segment=0a, RP=01, RC=01, Socket=01
+  000a:a0:00.0: Bus=a0, Segment=0a, RP=02, RC=01, Socket=01
+  000a:e0:00.0: Bus=e0, Segment=0a, RP=03, RC=01, Socket=01
+  000b:00:00.0: Bus=00, Segment=0b, RP=00, RC=02, Socket=01
+  000c:00:00.0: Bus=00, Segment=0c, RP=00, RC=03, Socket=01
+  000d:00:00.0: Bus=00, Segment=0d, RP=00, RC=04, Socket=01
+  000d:40:00.0: Bus=40, Segment=0d, RP=01, RC=04, Socket=01
+  000d:c0:00.0: Bus=c0, Segment=0d, RP=02, RC=04, Socket=01
+  000e:00:00.0: Bus=00, Segment=0e, RP=00, RC=05, Socket=01
diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
index c67667097a3c..3a5531d1f94c 100644
--- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
+++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
@@ -8,6 +8,7 @@
 
 #include <linux/io.h>
 #include <linux/module.h>
+#include <linux/property.h>
 #include <linux/topology.h>
 
 #include "arm_cspmu.h"
@@ -28,6 +29,19 @@
 #define NV_UCF_FILTER_DST            GENMASK_ULL(11, 8)
 #define NV_UCF_FILTER_DEFAULT        (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DST)
 
+#define NV_PCIE_V2_PORT_COUNT        8ULL
+#define NV_PCIE_V2_FILTER_ID_MASK    GENMASK_ULL(24, 0)
+#define NV_PCIE_V2_FILTER_PORT       GENMASK_ULL(NV_PCIE_V2_PORT_COUNT - 1, 0)
+#define NV_PCIE_V2_FILTER_BDF_VAL    GENMASK_ULL(23, NV_PCIE_V2_PORT_COUNT)
+#define NV_PCIE_V2_FILTER_BDF_EN     BIT(24)
+#define NV_PCIE_V2_FILTER_BDF_VAL_EN GENMASK_ULL(24, NV_PCIE_V2_PORT_COUNT)
+#define NV_PCIE_V2_FILTER_DEFAULT    NV_PCIE_V2_FILTER_PORT
+
+#define NV_PCIE_V2_DST_COUNT         5ULL
+#define NV_PCIE_V2_FILTER2_ID_MASK   GENMASK_ULL(4, 0)
+#define NV_PCIE_V2_FILTER2_DST       GENMASK_ULL(NV_PCIE_V2_DST_COUNT - 1, 0)
+#define NV_PCIE_V2_FILTER2_DEFAULT   NV_PCIE_V2_FILTER2_DST
+
 #define NV_GENERIC_FILTER_ID_MASK    GENMASK_ULL(31, 0)
 
 #define NV_PRODID_MASK	(PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISION)
@@ -162,6 +176,16 @@ static struct attribute *ucf_pmu_event_attrs[] = {
 	NULL,
 };
 
+static struct attribute *pcie_v2_pmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(rd_bytes,		0x0),
+	ARM_CSPMU_EVENT_ATTR(wr_bytes,		0x1),
+	ARM_CSPMU_EVENT_ATTR(rd_req,		0x2),
+	ARM_CSPMU_EVENT_ATTR(wr_req,		0x3),
+	ARM_CSPMU_EVENT_ATTR(rd_cum_outs,	0x4),
+	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
 static struct attribute *generic_pmu_event_attrs[] = {
 	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
 	NULL,
@@ -202,6 +226,19 @@ static struct attribute *ucf_pmu_format_attrs[] = {
 	NULL,
 };
 
+static struct attribute *pcie_v2_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_ATTR(src_rp_mask, "config1:0-7"),
+	ARM_CSPMU_FORMAT_ATTR(src_bdf, "config1:8-23"),
+	ARM_CSPMU_FORMAT_ATTR(src_bdf_en, "config1:24"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config2:0"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config2:1"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_p2p, "config2:2"),
+	ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_cxl, "config2:3"),
+	ARM_CSPMU_FORMAT_ATTR(dst_rem, "config2:4"),
+	NULL,
+};
+
 static struct attribute *generic_pmu_format_attrs[] = {
 	ARM_CSPMU_FORMAT_EVENT_ATTR,
 	ARM_CSPMU_FORMAT_FILTER_ATTR,
@@ -233,6 +270,32 @@ nv_cspmu_get_name(const struct arm_cspmu *cspmu)
 	return ctx->name;
 }
 
+#if defined(CONFIG_ACPI)
+static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id)
+{
+	struct fwnode_handle *fwnode;
+	struct acpi_device *adev;
+	int ret;
+
+	adev = arm_cspmu_acpi_dev_get(cspmu);
+	if (!adev)
+		return -ENODEV;
+
+	fwnode = acpi_fwnode_handle(adev);
+	ret = fwnode_property_read_u32(fwnode, "instance_id", id);
+	if (ret)
+		dev_err(cspmu->dev, "Failed to get instance ID\n");
+
+	acpi_dev_put(adev);
+	return ret;
+}
+#else
+static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id)
+{
+	return -EINVAL;
+}
+#endif
+
 static u32 nv_cspmu_event_filter(const struct perf_event *event)
 {
 	const struct nv_cspmu_ctx *ctx =
@@ -278,6 +341,20 @@ static void nv_cspmu_set_ev_filter(struct arm_cspmu *cspmu,
 	}
 }
 
+static void nv_cspmu_reset_ev_filter(struct arm_cspmu *cspmu,
+				     const struct perf_event *event)
+{
+	const struct nv_cspmu_ctx *ctx =
+		to_nv_cspmu_ctx(to_arm_cspmu(event->pmu));
+	const u32 offset = 4 * event->hw.idx;
+
+	if (ctx->get_filter)
+		writel(0, cspmu->base0 + PMEVFILTR + offset);
+
+	if (ctx->get_filter2)
+		writel(0, cspmu->base0 + PMEVFILT2R + offset);
+}
+
 static void nv_cspmu_set_cc_filter(struct arm_cspmu *cspmu,
 				   const struct perf_event *event)
 {
@@ -308,9 +385,103 @@ static u32 ucf_pmu_event_filter(const struct perf_event *event)
 	return ret;
 }
 
+static u32 pcie_v2_pmu_bdf_val_en(u32 filter)
+{
+	const u32 bdf_en = FIELD_GET(NV_PCIE_V2_FILTER_BDF_EN, filter);
+
+	/* Returns both BDF value and enable bit if BDF filtering is enabled. */
+	if (bdf_en)
+		return FIELD_GET(NV_PCIE_V2_FILTER_BDF_VAL_EN, filter);
+
+	/* Ignore the BDF value if BDF filter is not enabled. */
+	return 0;
+}
+
+static u32 pcie_v2_pmu_event_filter(const struct perf_event *event)
+{
+	u32 filter, lead_filter, lead_bdf;
+	struct perf_event *leader;
+	const struct nv_cspmu_ctx *ctx =
+		to_nv_cspmu_ctx(to_arm_cspmu(event->pmu));
+
+	filter = event->attr.config1 & ctx->filter_mask;
+	if (filter != 0)
+		return filter;
+
+	leader = event->group_leader;
+
+	/* Use leader's filter value if its BDF filtering is enabled. */
+	if (event != leader) {
+		lead_filter = pcie_v2_pmu_event_filter(leader);
+		lead_bdf = pcie_v2_pmu_bdf_val_en(lead_filter);
+		if (lead_bdf != 0)
+			return lead_filter;
+	}
+
+	/* Otherwise, return default filter value. */
+	return ctx->filter_default_val;
+}
+
+static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu,
+				   struct perf_event *new_ev)
+{
+	/*
+	 * Make sure the events are using same BDF filter since the PCIE-SRC PMU
+	 * only supports one common BDF filter setting for all of the counters.
+	 */
+
+	int idx;
+	u32 new_filter, new_rp, new_bdf, new_lead_filter, new_lead_bdf;
+	struct perf_event *leader, *new_leader;
+
+	if (cspmu->impl.ops.is_cycle_counter_event(new_ev))
+		return 0;
+
+	new_leader = new_ev->group_leader;
+
+	new_filter = pcie_v2_pmu_event_filter(new_ev);
+	new_lead_filter = pcie_v2_pmu_event_filter(new_leader);
+
+	new_bdf = pcie_v2_pmu_bdf_val_en(new_filter);
+	new_lead_bdf = pcie_v2_pmu_bdf_val_en(new_lead_filter);
+
+	new_rp = FIELD_GET(NV_PCIE_V2_FILTER_PORT, new_filter);
+
+	if (new_rp != 0 && new_bdf != 0) {
+		dev_err(cspmu->dev,
+			"RP and BDF filtering are mutually exclusive\n");
+		return -EINVAL;
+	}
+
+	if (new_bdf != new_lead_bdf) {
+		dev_err(cspmu->dev,
+			"sibling and leader BDF value should be equal\n");
+		return -EINVAL;
+	}
+
+	/* Compare BDF filter on existing events. */
+	idx = find_first_bit(cspmu->hw_events.used_ctrs,
+			     cspmu->cycle_counter_logical_idx);
+
+	if (idx != cspmu->cycle_counter_logical_idx) {
+		leader = cspmu->hw_events.events[idx]->group_leader;
+
+		const u32 lead_filter = pcie_v2_pmu_event_filter(leader);
+		const u32 lead_bdf = pcie_v2_pmu_bdf_val_en(lead_filter);
+
+		if (new_lead_bdf != lead_bdf) {
+			dev_err(cspmu->dev, "only one BDF value is supported\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 enum nv_cspmu_name_fmt {
 	NAME_FMT_GENERIC,
-	NAME_FMT_SOCKET
+	NAME_FMT_SOCKET,
+	NAME_FMT_SOCKET_INST
 };
 
 struct nv_cspmu_match {
@@ -430,6 +601,27 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
 		.init_data = NULL
 	  },
 	},
+	{
+	  .prodid = 0x10301000,
+	  .prodid_mask = NV_PRODID_MASK,
+	  .name_pattern = "nvidia_pcie_pmu_%u_rc_%u",
+	  .name_fmt = NAME_FMT_SOCKET_INST,
+	  .template_ctx = {
+		.event_attr = pcie_v2_pmu_event_attrs,
+		.format_attr = pcie_v2_pmu_format_attrs,
+		.filter_mask = NV_PCIE_V2_FILTER_ID_MASK,
+		.filter_default_val = NV_PCIE_V2_FILTER_DEFAULT,
+		.filter2_mask = NV_PCIE_V2_FILTER2_ID_MASK,
+		.filter2_default_val = NV_PCIE_V2_FILTER2_DEFAULT,
+		.get_filter = pcie_v2_pmu_event_filter,
+		.get_filter2 = nv_cspmu_event_filter2,
+		.init_data = NULL
+	  },
+	  .ops = {
+		.validate_event = pcie_v2_pmu_validate_event,
+		.reset_ev_filter = nv_cspmu_reset_ev_filter,
+	  }
+	},
 	{
 	  .prodid = 0,
 	  .prodid_mask = 0,
@@ -453,7 +645,7 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
 static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu,
 				  const struct nv_cspmu_match *match)
 {
-	char *name;
+	char *name = NULL;
 	struct device *dev = cspmu->dev;
 
 	static atomic_t pmu_generic_idx = {0};
@@ -467,6 +659,16 @@ static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu,
 				       socket);
 		break;
 	}
+	case NAME_FMT_SOCKET_INST: {
+		const int cpu = cpumask_first(&cspmu->associated_cpus);
+		const int socket = cpu_to_node(cpu);
+		u32 inst_id;
+
+		if (!nv_cspmu_get_inst_id(cspmu, &inst_id))
+			name = devm_kasprintf(dev, GFP_KERNEL,
+					match->name_pattern, socket, inst_id);
+		break;
+	}
 	case NAME_FMT_GENERIC:
 		name = devm_kasprintf(dev, GFP_KERNEL, match->name_pattern,
 				       atomic_fetch_inc(&pmu_generic_idx));
@@ -514,8 +716,10 @@ static int nv_cspmu_init_ops(struct arm_cspmu *cspmu)
 	cspmu->impl.ctx = ctx;
 
 	/* NVIDIA specific callbacks. */
+	SET_OP(validate_event, impl_ops, match, NULL);
 	SET_OP(set_cc_filter, impl_ops, match, nv_cspmu_set_cc_filter);
 	SET_OP(set_ev_filter, impl_ops, match, nv_cspmu_set_ev_filter);
+	SET_OP(reset_ev_filter, impl_ops, match, NULL);
 	SET_OP(get_event_attrs, impl_ops, match, nv_cspmu_get_event_attrs);
 	SET_OP(get_format_attrs, impl_ops, match, nv_cspmu_get_format_attrs);
 	SET_OP(get_name, impl_ops, match, nv_cspmu_get_name);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
                   ` (3 preceding siblings ...)
  2026-01-26 18:11 ` [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-29 22:40   ` Ilkka Koskinen
  2026-01-26 18:11 ` [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU Besar Wicaksono
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Adds PCIE-TGT PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  76 ++++
 drivers/perf/arm_cspmu/nvidia_cspmu.c         | 324 ++++++++++++++++++
 2 files changed, 400 insertions(+)

diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
index 8528685ddb61..07dc447eead7 100644
--- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -7,6 +7,7 @@ metrics like memory bandwidth, latency, and utilization:
 
 * Unified Coherence Fabric (UCF)
 * PCIE
+* PCIE-TGT
 
 PMU Driver
 ----------
@@ -211,6 +212,11 @@ Example usage:
 
     perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/
 
+.. _NVIDIA_T410_PCIE_PMU_RC_Mapping_Section:
+
+Mapping the RC# to lspci segment number
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 Mapping the RC# to lspci segment number can be non-trivial; hence a new NVIDIA
 Designated Vendor Specific Capability (DVSEC) register is added into the PCIE config space
 for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DVSEC register
@@ -266,3 +272,73 @@ Example output::
   000d:40:00.0: Bus=40, Segment=0d, RP=01, RC=04, Socket=01
   000d:c0:00.0: Bus=c0, Segment=0d, RP=02, RC=04, Socket=01
   000e:00:00.0: Bus=00, Segment=0e, RP=00, RC=05, Socket=01
+
+PCIE-TGT PMU
+------------
+
+The PCIE-TGT PMU monitors traffic targeting PCIE BAR and CXL HDM ranges.
+There is one PCIE-TGT PMU per PCIE root complex (RC) in the SoC. Each RC in
+Tegra410 SoC can have up to 16 lanes that can be bifurcated into up to 8 root
+ports (RP). The PMU provides RP filter to count PCIE BAR traffic to each RP and
+address filter to count access to PCIE BAR or CXL HDM ranges. The details
+of the filters are described in the following sections.
+
+Mapping the RC# to lspci segment number is similar to the PCIE PMU.
+Please see :ref:`NVIDIA_T410_PCIE_PMU_RC_Mapping_Section` for more info.
+
+The events and configuration options of this PMU device are available in sysfs,
+see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu_<socket-id>_rc_<pcie-rc-id>.
+
+The events in this PMU can be used to measure bandwidth and utilization:
+
+  * rd_req: count the number of read requests to PCIE.
+  * wr_req: count the number of write requests to PCIE.
+  * rd_bytes: count the number of bytes transferred by rd_req.
+  * wr_bytes: count the number of bytes transferred by wr_req.
+  * cycles: counts the PCIE cycles.
+
+The average bandwidth is calculated as::
+
+   AVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS
+   AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NS
+
+The average request rate is calculated as::
+
+   AVG_RD_REQUEST_RATE = RD_REQ / CYCLES
+   AVG_WR_REQUEST_RATE = WR_REQ / CYCLES
+
+The PMU events can be filtered based on the destination root port or target
+address range. Filtering based on RP is only available for PCIE BAR traffic.
+Address filter works for both PCIE BAR and CXL HDM ranges. These filters can be
+found in sysfs, see
+/sys/bus/event_source/devices/nvidia_pcie_tgt_pmu_<socket-id>_rc_<pcie-rc-id>/format/.
+
+Destination filter settings:
+
+* dst_rp_mask: bitmask to select the root port(s) to monitor. E.g. "dst_rp_mask=0xFF"
+  corresponds to all root ports (from 0 to 7) in the PCIE RC. Note that this filter is
+  only available for PCIE BAR traffic.
+* dst_addr_base: BAR or CXL HDM filter base address.
+* dst_addr_mask: BAR or CXL HDM filter address mask.
+* dst_addr_en: enable BAR or CXL HDM address range filter. If this is set, the
+  address range specified by "dst_addr_base" and "dst_addr_mask" will be used to filter
+  the PCIE BAR and CXL HDM traffic address. The PMU uses the following comparison
+  to determine if the traffic destination address falls within the filter range::
+
+    (txn's addr & dst_addr_mask) == (dst_addr_base & dst_addr_mask)
+
+  If the comparison succeeds, then the event will be counted.
+
+If the destination filter is not specified, the RP filter will be configured by default
+to count PCIE BAR traffic to all root ports.
+
+Example usage:
+
+* Count event id 0x0 to root port 0 and 1 of PCIE RC-0 on socket 0::
+
+    perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/event=0x0,dst_rp_mask=0x3/
+
+* Count event id 0x1 for accesses to PCIE BAR or CXL HDM address range
+  0x10000 to 0x100FF on socket 0's PCIE RC-1::
+
+    perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/
diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
index 3a5531d1f94c..095d2f322c6f 100644
--- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
+++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
@@ -42,6 +42,24 @@
 #define NV_PCIE_V2_FILTER2_DST       GENMASK_ULL(NV_PCIE_V2_DST_COUNT - 1, 0)
 #define NV_PCIE_V2_FILTER2_DEFAULT   NV_PCIE_V2_FILTER2_DST
 
+#define NV_PCIE_TGT_PORT_COUNT       8ULL
+#define NV_PCIE_TGT_EV_TYPE_CC       0x4
+#define NV_PCIE_TGT_EV_TYPE_COUNT    3ULL
+#define NV_PCIE_TGT_EV_TYPE_MASK     GENMASK_ULL(NV_PCIE_TGT_EV_TYPE_COUNT - 1, 0)
+#define NV_PCIE_TGT_FILTER2_MASK     GENMASK_ULL(NV_PCIE_TGT_PORT_COUNT, 0)
+#define NV_PCIE_TGT_FILTER2_PORT     GENMASK_ULL(NV_PCIE_TGT_PORT_COUNT - 1, 0)
+#define NV_PCIE_TGT_FILTER2_ADDR_EN  BIT(NV_PCIE_TGT_PORT_COUNT)
+#define NV_PCIE_TGT_FILTER2_ADDR     GENMASK_ULL(15, NV_PCIE_TGT_PORT_COUNT)
+#define NV_PCIE_TGT_FILTER2_DEFAULT  NV_PCIE_TGT_FILTER2_PORT
+
+#define NV_PCIE_TGT_ADDR_COUNT       8ULL
+#define NV_PCIE_TGT_ADDR_STRIDE      20
+#define NV_PCIE_TGT_ADDR_CTRL        0xD38
+#define NV_PCIE_TGT_ADDR_BASE_LO     0xD3C
+#define NV_PCIE_TGT_ADDR_BASE_HI     0xD40
+#define NV_PCIE_TGT_ADDR_MASK_LO     0xD44
+#define NV_PCIE_TGT_ADDR_MASK_HI     0xD48
+
 #define NV_GENERIC_FILTER_ID_MASK    GENMASK_ULL(31, 0)
 
 #define NV_PRODID_MASK	(PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISION)
@@ -186,6 +204,15 @@ static struct attribute *pcie_v2_pmu_event_attrs[] = {
 	NULL,
 };
 
+static struct attribute *pcie_tgt_pmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(rd_bytes,		0x0),
+	ARM_CSPMU_EVENT_ATTR(wr_bytes,		0x1),
+	ARM_CSPMU_EVENT_ATTR(rd_req,		0x2),
+	ARM_CSPMU_EVENT_ATTR(wr_req,		0x3),
+	ARM_CSPMU_EVENT_ATTR(cycles, NV_PCIE_TGT_EV_TYPE_CC),
+	NULL,
+};
+
 static struct attribute *generic_pmu_event_attrs[] = {
 	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
 	NULL,
@@ -239,6 +266,15 @@ static struct attribute *pcie_v2_pmu_format_attrs[] = {
 	NULL,
 };
 
+static struct attribute *pcie_tgt_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_ATTR(event, "config:0-2"),
+	ARM_CSPMU_FORMAT_ATTR(dst_rp_mask, "config:3-10"),
+	ARM_CSPMU_FORMAT_ATTR(dst_addr_en, "config:11"),
+	ARM_CSPMU_FORMAT_ATTR(dst_addr_base, "config1:0-63"),
+	ARM_CSPMU_FORMAT_ATTR(dst_addr_mask, "config2:0-63"),
+	NULL,
+};
+
 static struct attribute *generic_pmu_format_attrs[] = {
 	ARM_CSPMU_FORMAT_EVENT_ATTR,
 	ARM_CSPMU_FORMAT_FILTER_ATTR,
@@ -478,6 +514,268 @@ static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu,
 	return 0;
 }
 
+struct pcie_tgt_addr_filter {
+	u32 refcount;
+	u64 base;
+	u64 mask;
+};
+
+struct pcie_tgt_data {
+	struct pcie_tgt_addr_filter addr_filter[NV_PCIE_TGT_ADDR_COUNT];
+	void __iomem *addr_filter_reg;
+};
+
+#if defined(CONFIG_ACPI)
+static int pcie_tgt_init_data(struct arm_cspmu *cspmu)
+{
+	int ret;
+	struct acpi_device *adev;
+	struct pcie_tgt_data *data;
+	struct list_head resource_list;
+	struct resource_entry *rentry;
+	struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
+	struct device *dev = cspmu->dev;
+
+	data = devm_kzalloc(dev, sizeof(struct pcie_tgt_data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	adev = arm_cspmu_acpi_dev_get(cspmu);
+	if (!adev) {
+		dev_err(dev, "failed to get associated PCIE-TGT device\n");
+		return -ENODEV;
+	}
+
+	INIT_LIST_HEAD(&resource_list);
+	ret = acpi_dev_get_memory_resources(adev, &resource_list);
+	if (ret < 0) {
+		dev_err(dev, "failed to get PCIE-TGT device memory resources\n");
+		acpi_dev_put(adev);
+		return ret;
+	}
+
+	rentry = list_first_entry_or_null(
+		&resource_list, struct resource_entry, node);
+	if (rentry) {
+		data->addr_filter_reg = devm_ioremap_resource(dev, rentry->res);
+		ret = 0;
+	}
+
+	if (IS_ERR(data->addr_filter_reg)) {
+		dev_err(dev, "failed to get address filter resource\n");
+		ret = PTR_ERR(data->addr_filter_reg);
+	}
+
+	acpi_dev_free_resource_list(&resource_list);
+	acpi_dev_put(adev);
+
+	ctx->data = data;
+
+	return ret;
+}
+#else
+static int pcie_tgt_init_data(struct arm_cspmu *cspmu)
+{
+	return -ENODEV;
+}
+#endif
+
+static struct pcie_tgt_data *pcie_tgt_get_data(struct arm_cspmu *cspmu)
+{
+	struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
+
+	return ctx->data;
+}
+
+/* Find the first available address filter slot. */
+static int pcie_tgt_find_addr_idx(struct arm_cspmu *cspmu, u64 base, u64 mask,
+	bool is_reset)
+{
+	int i;
+	struct pcie_tgt_data *data = pcie_tgt_get_data(cspmu);
+
+	for (i = 0; i < NV_PCIE_TGT_ADDR_COUNT; i++) {
+		if (!is_reset && data->addr_filter[i].refcount == 0)
+			return i;
+
+		if (data->addr_filter[i].base == base &&
+			data->addr_filter[i].mask == mask)
+			return i;
+	}
+
+	return -ENODEV;
+}
+
+static u32 pcie_tgt_pmu_event_filter(const struct perf_event *event)
+{
+	u32 filter;
+
+	filter = (event->attr.config >> NV_PCIE_TGT_EV_TYPE_COUNT) &
+		NV_PCIE_TGT_FILTER2_MASK;
+
+	return filter;
+}
+
+static bool pcie_tgt_pmu_addr_en(const struct perf_event *event)
+{
+	u32 filter = pcie_tgt_pmu_event_filter(event);
+
+	return FIELD_GET(NV_PCIE_TGT_FILTER2_ADDR_EN, filter) != 0;
+}
+
+static u32 pcie_tgt_pmu_port_filter(const struct perf_event *event)
+{
+	u32 filter = pcie_tgt_pmu_event_filter(event);
+
+	return FIELD_GET(NV_PCIE_TGT_FILTER2_PORT, filter);
+}
+
+static u64 pcie_tgt_pmu_dst_addr_base(const struct perf_event *event)
+{
+	return event->attr.config1;
+}
+
+static u64 pcie_tgt_pmu_dst_addr_mask(const struct perf_event *event)
+{
+	return event->attr.config2;
+}
+
+static int pcie_tgt_pmu_validate_event(struct arm_cspmu *cspmu,
+				   struct perf_event *new_ev)
+{
+	u64 base, mask;
+	int idx;
+
+	if (!pcie_tgt_pmu_addr_en(new_ev))
+		return 0;
+
+	/* Make sure there is a slot available for the address filter. */
+	base = pcie_tgt_pmu_dst_addr_base(new_ev);
+	mask = pcie_tgt_pmu_dst_addr_mask(new_ev);
+	idx = pcie_tgt_find_addr_idx(cspmu, base, mask, false);
+	if (idx < 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void pcie_tgt_pmu_config_addr_filter(struct arm_cspmu *cspmu,
+	bool en, u64 base, u64 mask, int idx)
+{
+	struct pcie_tgt_data *data;
+	struct pcie_tgt_addr_filter *filter;
+	void __iomem *filter_reg;
+
+	data = pcie_tgt_get_data(cspmu);
+	filter = &data->addr_filter[idx];
+	filter_reg = data->addr_filter_reg + (idx * NV_PCIE_TGT_ADDR_STRIDE);
+
+	if (en) {
+		filter->refcount++;
+		if (filter->refcount == 1) {
+			filter->base = base;
+			filter->mask = mask;
+
+			writel(lower_32_bits(base), filter_reg + NV_PCIE_TGT_ADDR_BASE_LO);
+			writel(upper_32_bits(base), filter_reg + NV_PCIE_TGT_ADDR_BASE_HI);
+			writel(lower_32_bits(mask), filter_reg + NV_PCIE_TGT_ADDR_MASK_LO);
+			writel(upper_32_bits(mask), filter_reg + NV_PCIE_TGT_ADDR_MASK_HI);
+			writel(1, filter_reg + NV_PCIE_TGT_ADDR_CTRL);
+		}
+	} else {
+		filter->refcount--;
+		if (filter->refcount == 0) {
+			writel(0, filter_reg + NV_PCIE_TGT_ADDR_CTRL);
+			writel(0, filter_reg + NV_PCIE_TGT_ADDR_BASE_LO);
+			writel(0, filter_reg + NV_PCIE_TGT_ADDR_BASE_HI);
+			writel(0, filter_reg + NV_PCIE_TGT_ADDR_MASK_LO);
+			writel(0, filter_reg + NV_PCIE_TGT_ADDR_MASK_HI);
+
+			filter->base = 0;
+			filter->mask = 0;
+		}
+	}
+}
+
+static void pcie_tgt_pmu_set_ev_filter(struct arm_cspmu *cspmu,
+				const struct perf_event *event)
+{
+	bool addr_filter_en;
+	int idx;
+	u32 filter2_val, filter2_offset, port_filter;
+	u64 base, mask;
+
+	filter2_val = 0;
+	filter2_offset = PMEVFILT2R + (4 * event->hw.idx);
+
+	addr_filter_en = pcie_tgt_pmu_addr_en(event);
+	if (addr_filter_en) {
+		base = pcie_tgt_pmu_dst_addr_base(event);
+		mask = pcie_tgt_pmu_dst_addr_mask(event);
+		idx = pcie_tgt_find_addr_idx(cspmu, base, mask, false);
+
+		if (idx < 0) {
+			dev_err(cspmu->dev,
+				"Unable to find a slot for address filtering\n");
+			writel(0, cspmu->base0 + filter2_offset);
+			return;
+		}
+
+		/* Configure address range filter registers.*/
+		pcie_tgt_pmu_config_addr_filter(cspmu, true, base, mask, idx);
+
+		/* Config the counter to use the selected address filter slot. */
+		filter2_val |= FIELD_PREP(NV_PCIE_TGT_FILTER2_ADDR, 1U << idx);
+	}
+
+	port_filter = pcie_tgt_pmu_port_filter(event);
+
+	/* Monitor all ports if no filter is selected. */
+	if (!addr_filter_en && port_filter == 0)
+		port_filter = NV_PCIE_TGT_FILTER2_PORT;
+
+	filter2_val |= FIELD_PREP(NV_PCIE_TGT_FILTER2_PORT, port_filter);
+
+	writel(filter2_val, cspmu->base0 + filter2_offset);
+}
+
+static void pcie_tgt_pmu_reset_ev_filter(struct arm_cspmu *cspmu,
+				     const struct perf_event *event)
+{
+	bool addr_filter_en;
+	u64 base, mask;
+	int idx;
+
+	addr_filter_en = pcie_tgt_pmu_addr_en(event);
+	if (!addr_filter_en)
+		return;
+
+	base = pcie_tgt_pmu_dst_addr_base(event);
+	mask = pcie_tgt_pmu_dst_addr_mask(event);
+	idx = pcie_tgt_find_addr_idx(cspmu, base, mask, true);
+
+	if (idx < 0) {
+		dev_err(cspmu->dev,
+			"Unable to find the address filter slot to reset\n");
+		return;
+	}
+
+	pcie_tgt_pmu_config_addr_filter(
+			cspmu, false, base, mask, idx);
+}
+
+static u32 pcie_tgt_pmu_event_type(const struct perf_event *event)
+{
+	return event->attr.config & NV_PCIE_TGT_EV_TYPE_MASK;
+}
+
+static bool pcie_tgt_pmu_is_cycle_counter_event(const struct perf_event *event)
+{
+	u32 event_type = pcie_tgt_pmu_event_type(event);
+
+	return event_type == NV_PCIE_TGT_EV_TYPE_CC;
+}
+
 enum nv_cspmu_name_fmt {
 	NAME_FMT_GENERIC,
 	NAME_FMT_SOCKET,
@@ -622,6 +920,30 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
 		.reset_ev_filter = nv_cspmu_reset_ev_filter,
 	  }
 	},
+	{
+	  .prodid = 0x10700000,
+	  .prodid_mask = NV_PRODID_MASK,
+	  .name_pattern = "nvidia_pcie_tgt_pmu_%u_rc_%u",
+	  .name_fmt = NAME_FMT_SOCKET_INST,
+	  .template_ctx = {
+		.event_attr = pcie_tgt_pmu_event_attrs,
+		.format_attr = pcie_tgt_pmu_format_attrs,
+		.filter_mask = 0x0,
+		.filter_default_val = 0x0,
+		.filter2_mask = NV_PCIE_TGT_FILTER2_MASK,
+		.filter2_default_val = NV_PCIE_TGT_FILTER2_DEFAULT,
+		.get_filter = NULL,
+		.get_filter2 = NULL,
+		.init_data = pcie_tgt_init_data
+	  },
+	  .ops = {
+		.is_cycle_counter_event = pcie_tgt_pmu_is_cycle_counter_event,
+		.event_type = pcie_tgt_pmu_event_type,
+		.validate_event = pcie_tgt_pmu_validate_event,
+		.set_ev_filter = pcie_tgt_pmu_set_ev_filter,
+		.reset_ev_filter = pcie_tgt_pmu_reset_ev_filter,
+	  }
+	},
 	{
 	  .prodid = 0,
 	  .prodid_mask = 0,
@@ -717,6 +1039,8 @@ static int nv_cspmu_init_ops(struct arm_cspmu *cspmu)
 
 	/* NVIDIA specific callbacks. */
 	SET_OP(validate_event, impl_ops, match, NULL);
+	SET_OP(event_type, impl_ops, match, NULL);
+	SET_OP(is_cycle_counter_event, impl_ops, match, NULL);
 	SET_OP(set_cc_filter, impl_ops, match, nv_cspmu_set_cc_filter);
 	SET_OP(set_ev_filter, impl_ops, match, nv_cspmu_set_ev_filter);
 	SET_OP(reset_ev_filter, impl_ops, match, NULL);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
                   ` (4 preceding siblings ...)
  2026-01-26 18:11 ` [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-28  0:40   ` kernel test robot
  2026-01-30  2:09   ` Ilkka Koskinen
  2026-01-26 18:11 ` [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU Besar Wicaksono
  2026-01-26 18:11 ` [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU Besar Wicaksono
  7 siblings, 2 replies; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Adds CPU Memory (CMEM) Latency  PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  25 +
 drivers/perf/Kconfig                          |   7 +
 drivers/perf/Makefile                         |   1 +
 drivers/perf/nvidia_t410_cmem_latency_pmu.c   | 727 ++++++++++++++++++
 4 files changed, 760 insertions(+)
 create mode 100644 drivers/perf/nvidia_t410_cmem_latency_pmu.c

diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
index 07dc447eead7..11fc1c88346a 100644
--- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -8,6 +8,7 @@ metrics like memory bandwidth, latency, and utilization:
 * Unified Coherence Fabric (UCF)
 * PCIE
 * PCIE-TGT
+* CPU Memory (CMEM) Latency
 
 PMU Driver
 ----------
@@ -342,3 +343,27 @@ Example usage:
   0x10000 to 0x100FF on socket 0's PCIE RC-1::
 
     perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/
+
+CPU Memory (CMEM) Latency PMU
+-----------------------------
+
+This PMU monitors latency events of memory read requests to local
+CPU DRAM:
+
+  * RD_REQ counters: count read requests (32B per request).
+  * RD_CUM_OUTS counters: accumulated outstanding request counter, which track
+    how many cycles the read requests are in flight.
+  * CYCLES counter: counts the number of elapsed cycles.
+
+The average latency is calculated as::
+
+   FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
+   AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ
+   AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_<socket-id>.
+
+Example usage::
+
+  perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 638321fc9800..9fed3c41d5ea 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -311,4 +311,11 @@ config MARVELL_PEM_PMU
 	  Enable support for PCIe Interface performance monitoring
 	  on Marvell platform.
 
+config NVIDIA_TEGRA410_CMEM_LATENCY_PMU
+	tristate "NVIDIA Tegra410 CPU Memory Latency PMU"
+	depends on ARM64
+	help
+	  Enable perf support for CPU memory latency counters monitoring on
+	  NVIDIA Tegra410 SoC.
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index ea52711a87e3..4aa6aad393c2 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -35,3 +35,4 @@ obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o
 obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
 obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
 obj-$(CONFIG_CXL_PMU) += cxl_pmu.o
+obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) += nvidia_t410_cmem_latency_pmu.o
diff --git a/drivers/perf/nvidia_t410_cmem_latency_pmu.c b/drivers/perf/nvidia_t410_cmem_latency_pmu.c
new file mode 100644
index 000000000000..9b466581c8fc
--- /dev/null
+++ b/drivers/perf/nvidia_t410_cmem_latency_pmu.c
@@ -0,0 +1,727 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVIDIA Tegra410 CPU Memory (CMEM) Latency PMU driver.
+ *
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+
+#define NUM_INSTANCES    14
+#define BCAST(pmu) pmu->base[NUM_INSTANCES]
+
+/* Register offsets. */
+#define CG_CTRL         0x800
+#define CTRL            0x808
+#define STATUS          0x810
+#define CYCLE_CNTR      0x818
+#define MC0_REQ_CNTR    0x820
+#define MC0_AOR_CNTR    0x830
+#define MC1_REQ_CNTR    0x838
+#define MC1_AOR_CNTR    0x848
+#define MC2_REQ_CNTR    0x850
+#define MC2_AOR_CNTR    0x860
+
+/* CTRL values. */
+#define CTRL_DISABLE    0x0ULL
+#define CTRL_ENABLE     0x1ULL
+#define CTRL_CLR        0x2ULL
+
+/* CG_CTRL values. */
+#define CG_CTRL_DISABLE    0x0ULL
+#define CG_CTRL_ENABLE     0x1ULL
+
+/* STATUS register field. */
+#define STATUS_CYCLE_OVF      BIT(0)
+#define STATUS_MC0_AOR_OVF    BIT(1)
+#define STATUS_MC0_REQ_OVF    BIT(3)
+#define STATUS_MC1_AOR_OVF    BIT(4)
+#define STATUS_MC1_REQ_OVF    BIT(6)
+#define STATUS_MC2_AOR_OVF    BIT(7)
+#define STATUS_MC2_REQ_OVF    BIT(9)
+
+/* Events. */
+#define EVENT_CYCLES    0x0
+#define EVENT_REQ       0x1
+#define EVENT_AOR       0x2
+
+#define NUM_EVENTS           0x3
+#define MASK_EVENT           0x3
+#define MAX_ACTIVE_EVENTS    32
+
+#define ACTIVE_CPU_MASK        0x0
+#define ASSOCIATED_CPU_MASK    0x1
+
+static unsigned long cmem_lat_pmu_cpuhp_state;
+
+struct cmem_lat_pmu_hw_events {
+	struct perf_event *events[MAX_ACTIVE_EVENTS];
+	DECLARE_BITMAP(used_ctrs, MAX_ACTIVE_EVENTS);
+};
+
+struct cmem_lat_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	const char *name;
+	const char *identifier;
+	void __iomem *base[NUM_INSTANCES + 1];
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+	struct hlist_node node;
+	struct cmem_lat_pmu_hw_events hw_events;
+};
+
+#define to_cmem_lat_pmu(p) \
+	container_of(p, struct cmem_lat_pmu, pmu)
+
+
+/* Get event type from perf_event. */
+static inline u32 get_event_type(struct perf_event *event)
+{
+	return (event->attr.config) & MASK_EVENT;
+}
+
+/* PMU operations. */
+static int cmem_lat_pmu_get_event_idx(struct cmem_lat_pmu_hw_events *hw_events,
+				struct perf_event *event)
+{
+	unsigned int idx;
+
+	idx = find_first_zero_bit(hw_events->used_ctrs, MAX_ACTIVE_EVENTS);
+	if (idx >= MAX_ACTIVE_EVENTS)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool cmem_lat_pmu_validate_event(struct pmu *pmu,
+				 struct cmem_lat_pmu_hw_events *hw_events,
+				 struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+
+	return (cmem_lat_pmu_get_event_idx(hw_events, event) >= 0);
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool cmem_lat_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct cmem_lat_pmu_hw_events fake_hw_events;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events,
+						sibling))
+			return false;
+	}
+
+	return cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int cmem_lat_pmu_event_init(struct perf_event *event)
+{
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u32 event_type = get_event_type(event);
+
+	if (event->attr.type != event->pmu->type ||
+	    event_type >= NUM_EVENTS)
+		return -ENOENT;
+
+	/*
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(cmem_lat_pmu->pmu.dev,
+			"Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(cmem_lat_pmu->pmu.dev,
+			"Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &cmem_lat_pmu->associated_cpus)) {
+		dev_dbg(cmem_lat_pmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu = cpumask_first(&cmem_lat_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!cmem_lat_pmu_validate_group(event))
+		return -EINVAL;
+
+	hwc->idx = -1;
+	hwc->config = event_type;
+
+	return 0;
+}
+
+static u64 cmem_lat_pmu_read_status(struct cmem_lat_pmu *cmem_lat_pmu,
+				   unsigned int inst)
+{
+	return readq(cmem_lat_pmu->base[inst] + STATUS);
+}
+
+static u64 cmem_lat_pmu_read_cycle_counter(struct perf_event *event)
+{
+	const unsigned int instance = 0;
+	u64 status;
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
+	struct device *dev = cmem_lat_pmu->dev;
+
+	/*
+	 * Use the reading from first instance since all instances are
+	 * identical.
+	 */
+	status = cmem_lat_pmu_read_status(cmem_lat_pmu, instance);
+	if (status & STATUS_CYCLE_OVF)
+		dev_warn(dev, "Cycle counter overflow\n");
+
+	return readq(cmem_lat_pmu->base[instance] + CYCLE_CNTR);
+}
+
+static u64 cmem_lat_pmu_read_req_counter(struct perf_event *event)
+{
+	unsigned int i;
+	u64 status, val = 0;
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
+	struct device *dev = cmem_lat_pmu->dev;
+
+	/* Sum up the counts from all instances. */
+	for (i = 0; i < NUM_INSTANCES; i++) {
+		status = cmem_lat_pmu_read_status(cmem_lat_pmu, i);
+		if (status & STATUS_MC0_REQ_OVF)
+			dev_warn(dev, "MC0 request counter overflow\n");
+		if (status & STATUS_MC1_REQ_OVF)
+			dev_warn(dev, "MC1 request counter overflow\n");
+		if (status & STATUS_MC2_REQ_OVF)
+			dev_warn(dev, "MC2 request counter overflow\n");
+
+		val += readq(cmem_lat_pmu->base[i] + MC0_REQ_CNTR);
+		val += readq(cmem_lat_pmu->base[i] + MC1_REQ_CNTR);
+		val += readq(cmem_lat_pmu->base[i] + MC2_REQ_CNTR);
+	}
+
+	return val;
+}
+
+static u64 cmem_lat_pmu_read_aor_counter(struct perf_event *event)
+{
+	unsigned int i;
+	u64 status, val = 0;
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
+	struct device *dev = cmem_lat_pmu->dev;
+
+	/* Sum up the counts from all instances. */
+	for (i = 0; i < NUM_INSTANCES; i++) {
+		status = cmem_lat_pmu_read_status(cmem_lat_pmu, i);
+		if (status & STATUS_MC0_AOR_OVF)
+			dev_warn(dev, "MC0 AOR counter overflow\n");
+		if (status & STATUS_MC1_AOR_OVF)
+			dev_warn(dev, "MC1 AOR counter overflow\n");
+		if (status & STATUS_MC2_AOR_OVF)
+			dev_warn(dev, "MC2 AOR counter overflow\n");
+
+		val += readq(cmem_lat_pmu->base[i] + MC0_AOR_CNTR);
+		val += readq(cmem_lat_pmu->base[i] + MC1_AOR_CNTR);
+		val += readq(cmem_lat_pmu->base[i] + MC2_AOR_CNTR);
+	}
+
+	return val;
+}
+
+static u64 (*read_counter_fn[NUM_EVENTS])(struct perf_event *) = {
+	[EVENT_CYCLES] = cmem_lat_pmu_read_cycle_counter,
+	[EVENT_REQ] = cmem_lat_pmu_read_req_counter,
+	[EVENT_AOR] = cmem_lat_pmu_read_aor_counter,
+};
+
+static void cmem_lat_pmu_event_update(struct perf_event *event)
+{
+	u32 event_type;
+	u64 prev, now;
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->state & PERF_HES_STOPPED)
+		return;
+
+	event_type = hwc->config;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+		now = read_counter_fn[event_type](event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	local64_add(now - prev, &event->count);
+
+	hwc->state |= PERF_HES_UPTODATE;
+}
+
+static void cmem_lat_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	event->hw.state = 0;
+}
+
+static void cmem_lat_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	event->hw.state |= PERF_HES_STOPPED;
+}
+
+static int cmem_lat_pmu_add(struct perf_event *event, int flags)
+{
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
+	struct cmem_lat_pmu_hw_events *hw_events = &cmem_lat_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &cmem_lat_pmu->associated_cpus)))
+		return -ENOENT;
+
+	idx = cmem_lat_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] = event;
+	hwc->idx = idx;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		cmem_lat_pmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void cmem_lat_pmu_del(struct perf_event *event, int flags)
+{
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
+	struct cmem_lat_pmu_hw_events *hw_events = &cmem_lat_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	cmem_lat_pmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] = NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void cmem_lat_pmu_read(struct perf_event *event)
+{
+	cmem_lat_pmu_event_update(event);
+}
+
+static inline void cmem_lat_pmu_cg_ctrl(struct cmem_lat_pmu *cmem_lat_pmu, u64 val)
+{
+	writeq(val, BCAST(cmem_lat_pmu) + CG_CTRL);
+}
+
+static inline void cmem_lat_pmu_ctrl(struct cmem_lat_pmu *cmem_lat_pmu, u64 val)
+{
+	writeq(val, BCAST(cmem_lat_pmu) + CTRL);
+}
+
+static void cmem_lat_pmu_enable(struct pmu *pmu)
+{
+	bool disabled;
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(pmu);
+
+	disabled = bitmap_empty(
+		cmem_lat_pmu->hw_events.used_ctrs, MAX_ACTIVE_EVENTS);
+
+	if (disabled)
+		return;
+
+	/* Enable all the counters. */
+	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_ENABLE);
+	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_ENABLE);
+}
+
+static void cmem_lat_pmu_disable(struct pmu *pmu)
+{
+	int idx;
+	struct perf_event *event;
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(pmu);
+
+	/* Disable all the counters. */
+	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_DISABLE);
+
+	/*
+	 * The counters will start from 0 again on restart.
+	 * Update the events immediately to avoid losing the counts.
+	 */
+	for_each_set_bit(
+		idx, cmem_lat_pmu->hw_events.used_ctrs, MAX_ACTIVE_EVENTS) {
+		event = cmem_lat_pmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		cmem_lat_pmu_event_update(event);
+
+		local64_set(&event->hw.prev_count, 0ULL);
+	}
+
+	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_CLR);
+	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_DISABLE);
+}
+
+/* PMU identifier attribute. */
+
+static ssize_t cmem_lat_pmu_identifier_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *page)
+{
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", cmem_lat_pmu->identifier);
+}
+
+static struct device_attribute cmem_lat_pmu_identifier_attr =
+	__ATTR(identifier, 0444, cmem_lat_pmu_identifier_show, NULL);
+
+static struct attribute *cmem_lat_pmu_identifier_attrs[] = {
+	&cmem_lat_pmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group cmem_lat_pmu_identifier_attr_group = {
+	.attrs = cmem_lat_pmu_identifier_attrs,
+};
+
+/* Format attributes. */
+
+#define NV_PMU_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+static struct attribute *cmem_lat_pmu_formats[] = {
+	NV_PMU_EXT_ATTR(event, device_show_string, "config:0-1"),
+	NULL,
+};
+
+static const struct attribute_group cmem_lat_pmu_format_group = {
+	.name = "format",
+	.attrs = cmem_lat_pmu_formats,
+};
+
+/* Event attributes. */
+
+static ssize_t cmem_lat_pmu_sysfs_event_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct perf_pmu_events_attr *pmu_attr;
+
+	pmu_attr = container_of(attr, typeof(*pmu_attr), attr);
+	return sysfs_emit(buf, "event=0x%llx\n", pmu_attr->id);
+}
+
+#define NV_PMU_EVENT_ATTR(_name, _config)	\
+	PMU_EVENT_ATTR_ID(_name, cmem_lat_pmu_sysfs_event_show, _config)
+
+static struct attribute *cmem_lat_pmu_events[] = {
+	NV_PMU_EVENT_ATTR(cycles, EVENT_CYCLES),
+	NV_PMU_EVENT_ATTR(rd_req, EVENT_REQ),
+	NV_PMU_EVENT_ATTR(rd_cum_outs, EVENT_AOR),
+	NULL
+};
+
+static const struct attribute_group cmem_lat_pmu_events_group = {
+	.name = "events",
+	.attrs = cmem_lat_pmu_events,
+};
+
+/* Cpumask attributes. */
+
+static ssize_t cmem_lat_pmu_cpumask_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(pmu);
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case ACTIVE_CPU_MASK:
+		cpumask = &cmem_lat_pmu->active_cpu;
+		break;
+	case ASSOCIATED_CPU_MASK:
+		cpumask = &cmem_lat_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+#define NV_PMU_CPUMASK_ATTR(_name, _config)			\
+	NV_PMU_EXT_ATTR(_name, cmem_lat_pmu_cpumask_show,	\
+				(unsigned long)_config)
+
+static struct attribute *cmem_lat_pmu_cpumask_attrs[] = {
+	NV_PMU_CPUMASK_ATTR(cpumask, ACTIVE_CPU_MASK),
+	NV_PMU_CPUMASK_ATTR(associated_cpus, ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static const struct attribute_group cmem_lat_pmu_cpumask_attr_group = {
+	.attrs = cmem_lat_pmu_cpumask_attrs,
+};
+
+/* Per PMU device attribute groups. */
+
+static const struct attribute_group *cmem_lat_pmu_attr_groups[] = {
+	&cmem_lat_pmu_identifier_attr_group,
+	&cmem_lat_pmu_format_group,
+	&cmem_lat_pmu_events_group,
+	&cmem_lat_pmu_cpumask_attr_group,
+	NULL,
+};
+
+static int cmem_lat_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct cmem_lat_pmu *cmem_lat_pmu =
+		hlist_entry_safe(node, struct cmem_lat_pmu, node);
+
+	if (!cpumask_test_cpu(cpu, &cmem_lat_pmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&cmem_lat_pmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	cpumask_set_cpu(cpu, &cmem_lat_pmu->active_cpu);
+
+	return 0;
+}
+
+static int cmem_lat_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	unsigned int dst;
+
+	struct cmem_lat_pmu *cmem_lat_pmu =
+		hlist_entry_safe(node, struct cmem_lat_pmu, node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &cmem_lat_pmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	dst = cpumask_any_and_but(&cmem_lat_pmu->associated_cpus,
+				  cpu_online_mask, cpu);
+	if (dst >= nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&cmem_lat_pmu->pmu, cpu, dst);
+	cpumask_set_cpu(dst, &cmem_lat_pmu->active_cpu);
+
+	return 0;
+}
+
+static int cmem_lat_pmu_get_cpus(struct cmem_lat_pmu *cmem_lat_pmu,
+				unsigned int socket)
+{
+	int ret = 0, cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (cpu_to_node(cpu) == socket)
+			cpumask_set_cpu(cpu, &cmem_lat_pmu->associated_cpus);
+	}
+
+	if (cpumask_empty(&cmem_lat_pmu->associated_cpus)) {
+		dev_dbg(cmem_lat_pmu->dev,
+			"No cpu associated with PMU socket-%u\n", socket);
+		ret = -ENODEV;
+	}
+
+	return ret;
+}
+
+static int cmem_lat_pmu_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct acpi_device *acpi_dev;
+	struct cmem_lat_pmu *cmem_lat_pmu;
+	char *name, *uid_str;
+	int ret, i;
+	u32 socket;
+
+	acpi_dev = ACPI_COMPANION(dev);
+	if (!acpi_dev)
+		return -ENODEV;
+
+	uid_str = acpi_device_uid(acpi_dev);
+	if (!uid_str)
+		return -ENODEV;
+
+	ret = kstrtou32(uid_str, 0, &socket);
+	if (ret)
+		return ret;
+
+	cmem_lat_pmu = devm_kzalloc(dev, sizeof(*cmem_lat_pmu), GFP_KERNEL);
+	name = devm_kasprintf(dev, GFP_KERNEL, "nvidia_cmem_latency_pmu_%u", socket);
+	if (!cmem_lat_pmu || !name)
+		return -ENOMEM;
+
+	cmem_lat_pmu->dev = dev;
+	cmem_lat_pmu->name = name;
+	cmem_lat_pmu->identifier = acpi_device_hid(acpi_dev);
+	platform_set_drvdata(pdev, cmem_lat_pmu);
+
+	cmem_lat_pmu->pmu = (struct pmu) {
+		.parent		= &pdev->dev,
+		.task_ctx_nr	= perf_invalid_context,
+		.pmu_enable	= cmem_lat_pmu_enable,
+		.pmu_disable	= cmem_lat_pmu_disable,
+		.event_init	= cmem_lat_pmu_event_init,
+		.add		= cmem_lat_pmu_add,
+		.del		= cmem_lat_pmu_del,
+		.start		= cmem_lat_pmu_start,
+		.stop		= cmem_lat_pmu_stop,
+		.read		= cmem_lat_pmu_read,
+		.attr_groups	= cmem_lat_pmu_attr_groups,
+		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE |
+					PERF_PMU_CAP_NO_INTERRUPT,
+	};
+
+	/* Map the address of all the instances plus one for the broadcast. */
+	for (i = 0; i < NUM_INSTANCES + 1; i++) {
+		cmem_lat_pmu->base[i] = devm_platform_ioremap_resource(pdev, i);
+		if (IS_ERR(cmem_lat_pmu->base[i])) {
+			dev_err(dev, "Failed map address for instance %d\n", i);
+			return PTR_ERR(cmem_lat_pmu->base[i]);
+		}
+	}
+
+	ret = cmem_lat_pmu_get_cpus(cmem_lat_pmu, socket);
+	if (ret)
+		return ret;
+
+	ret = cpuhp_state_add_instance(cmem_lat_pmu_cpuhp_state,
+				       &cmem_lat_pmu->node);
+	if (ret) {
+		dev_err(&pdev->dev, "Error %d registering hotplug\n", ret);
+		return ret;
+	}
+
+	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_ENABLE);
+	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_CLR);
+	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_DISABLE);
+
+	ret = perf_pmu_register(&cmem_lat_pmu->pmu, name, -1);
+	if (ret) {
+		dev_err(&pdev->dev, "Failed to register PMU: %d\n", ret);
+		cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state,
+					    &cmem_lat_pmu->node);
+		return ret;
+	}
+
+	dev_dbg(&pdev->dev, "Registered %s PMU\n", name);
+
+	return 0;
+}
+
+static void cmem_lat_pmu_device_remove(struct platform_device *pdev)
+{
+	struct cmem_lat_pmu *cmem_lat_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&cmem_lat_pmu->pmu);
+	cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state,
+				    &cmem_lat_pmu->node);
+}
+
+static const struct acpi_device_id cmem_lat_pmu_acpi_match[] = {
+	{ "NVDA2021", },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, cmem_lat_pmu_acpi_match);
+
+static struct platform_driver cmem_lat_pmu_driver = {
+	.driver = {
+		.name = "nvidia-t410-cmem-latency-pmu",
+		.acpi_match_table = ACPI_PTR(cmem_lat_pmu_acpi_match),
+		.suppress_bind_attrs = true,
+	},
+	.probe = cmem_lat_pmu_probe,
+	.remove = cmem_lat_pmu_device_remove,
+};
+
+static int __init cmem_lat_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				      "perf/nvidia/cmem_latency:online",
+				      cmem_lat_pmu_cpu_online,
+				      cmem_lat_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+
+	cmem_lat_pmu_cpuhp_state = ret;
+
+	return platform_driver_register(&cmem_lat_pmu_driver);
+}
+
+static void __exit cmem_lat_pmu_exit(void)
+{
+	platform_driver_unregister(&cmem_lat_pmu_driver);
+	cpuhp_remove_multi_state(cmem_lat_pmu_cpuhp_state);
+}
+
+module_init(cmem_lat_pmu_init);
+module_exit(cmem_lat_pmu_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NVIDIA Tegra410 CPU Memory Latency PMU driver");
+MODULE_AUTHOR("Besar Wicaksono <bwicaksono@nvidia.com>");
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
                   ` (5 preceding siblings ...)
  2026-01-26 18:11 ` [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-01-30  2:54   ` Ilkka Koskinen
  2026-01-26 18:11 ` [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU Besar Wicaksono
  7 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Adds NVIDIA C2C PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  151 +++
 drivers/perf/Kconfig                          |    7 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/nvidia_t410_c2c_pmu.c            | 1061 +++++++++++++++++
 4 files changed, 1220 insertions(+)
 create mode 100644 drivers/perf/nvidia_t410_c2c_pmu.c

diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
index 11fc1c88346a..f81f356debe1 100644
--- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -9,6 +9,9 @@ metrics like memory bandwidth, latency, and utilization:
 * PCIE
 * PCIE-TGT
 * CPU Memory (CMEM) Latency
+* NVLink-C2C
+* NV-CLink
+* NV-DLink
 
 PMU Driver
 ----------
@@ -367,3 +370,151 @@ see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_<socket-id>.
 Example usage::
 
   perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'
+
+NVLink-C2C PMU
+--------------
+
+This PMU monitors latency events of memory read/write requests that pass through
+the NVIDIA Chip-to-Chip (C2C) interface. Bandwidth events are not available
+in this PMU, unlike the C2C PMU in Grace (Tegra241 SoC).
+
+The events and configuration options of this PMU device are available in sysfs,
+see /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_<socket-id>.
+
+The list of events:
+
+  * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.
+  * IN_RD_REQ: the number of incoming read requests.
+  * IN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incoming write requests.
+  * IN_WR_REQ: the number of incoming write requests.
+  * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.
+  * OUT_RD_REQ: the number of outgoing read requests.
+  * OUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing write requests.
+  * OUT_WR_REQ: the number of outgoing write requests.
+  * CYCLES: NVLink-C2C interface cycle counts.
+
+The incoming events count the reads/writes from remote device to the SoC.
+The outgoing events count the reads/writes from the SoC to remote device.
+
+The sysfs /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_<socket-id>/peer
+contains the information about the connected device.
+
+When the C2C interface is connected to GPU(s), the user can use the
+"gpu_mask" parameter to filter traffic to/from specific GPU(s). Each bit represents the GPU
+index, e.g. "gpu_mask=0x1" corresponds to GPU 0 and "gpu_mask=0x3" is for GPU 0 and 1.
+The PMU will monitor all GPUs by default if not specified.
+
+When connected to another SoC, only the read events are available.
+
+The events can be used to calculate the average latency of the read/write requests::
+
+   C2C_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
+
+   IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ
+   IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
+
+   IN_WR_AVG_LATENCY_IN_CYCLES = IN_WR_CUM_OUTS / IN_WR_REQ
+   IN_WR_AVG_LATENCY_IN_NS = IN_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
+
+   OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ
+   OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
+
+   OUT_WR_AVG_LATENCY_IN_CYCLES = OUT_WR_CUM_OUTS / OUT_WR_REQ
+   OUT_WR_AVG_LATENCY_IN_NS = OUT_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
+
+Example usage:
+
+  * Count incoming traffic from all GPUs connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/
+
+  * Count incoming traffic from GPU 0 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x1/
+
+  * Count incoming traffic from GPU 1 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x2/
+
+  * Count outgoing traffic to all GPUs connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_req/
+
+  * Count outgoing traffic to GPU 0 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=0x1/
+
+  * Count outgoing traffic to GPU 1 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=0x2/
+
+NV-CLink PMU
+------------
+
+This PMU monitors latency events of memory read requests that pass through
+the NV-CLINK interface. Bandwidth events are not available in this PMU.
+In Tegra410 SoC, the NV-CLink interface is used to connect to another Tegra410
+SoC and this PMU only counts read traffic.
+
+The events and configuration options of this PMU device are available in sysfs,
+see /sys/bus/event_source/devices/nvidia_nvclink_pmu_<socket-id>.
+
+The list of events:
+
+  * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.
+  * IN_RD_REQ: the number of incoming read requests.
+  * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.
+  * OUT_RD_REQ: the number of outgoing read requests.
+  * CYCLES: NV-CLINK interface cycle counts.
+
+The incoming events count the reads from remote device to the SoC.
+The outgoing events count the reads from the SoC to remote device.
+
+The events can be used to calculate the average latency of the read requests::
+
+   CLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
+
+   IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ
+   IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ
+
+   OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ
+   OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ
+
+Example usage:
+
+  * Count incoming read traffic from remote SoC connected via NV-CLINK::
+
+      perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/
+
+  * Count outgoing read traffic to remote SoC connected via NV-CLINK::
+
+      perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/
+
+NV-DLink PMU
+------------
+
+This PMU monitors latency events of memory read requests that pass through
+the NV-DLINK interface.  Bandwidth events are not available in this PMU.
+In Tegra410 SoC, this PMU only counts CXL memory read traffic.
+
+The events and configuration options of this PMU device are available in sysfs,
+see /sys/bus/event_source/devices/nvidia_nvdlink_pmu_<socket-id>.
+
+The list of events:
+
+  * IN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to CXL memory.
+  * IN_RD_REQ: the number of read requests to CXL memory.
+  * CYCLES: NV-DLINK interface cycle counts.
+
+The events can be used to calculate the average latency of the read requests::
+
+   DLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
+
+   IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ
+   IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / DLINK_FREQ_IN_GHZ
+
+Example usage:
+
+  * Count read events to CXL memory::
+
+      perf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu_0/in_rd_cum_outs/}'
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 9fed3c41d5ea..7ee36efe6bc0 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -318,4 +318,11 @@ config NVIDIA_TEGRA410_CMEM_LATENCY_PMU
 	  Enable perf support for CPU memory latency counters monitoring on
 	  NVIDIA Tegra410 SoC.
 
+config NVIDIA_TEGRA410_C2C_PMU
+	tristate "NVIDIA Tegra410 C2C PMU"
+	depends on ARM64 && ACPI
+	help
+	  Enable perf support for counters in NVIDIA C2C interface of NVIDIA
+	  Tegra410 SoC.
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 4aa6aad393c2..eb8a022dad9a 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
 obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
 obj-$(CONFIG_CXL_PMU) += cxl_pmu.o
 obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) += nvidia_t410_cmem_latency_pmu.o
+obj-$(CONFIG_NVIDIA_TEGRA410_C2C_PMU) += nvidia_t410_c2c_pmu.o
diff --git a/drivers/perf/nvidia_t410_c2c_pmu.c b/drivers/perf/nvidia_t410_c2c_pmu.c
new file mode 100644
index 000000000000..362e0e5f8b24
--- /dev/null
+++ b/drivers/perf/nvidia_t410_c2c_pmu.c
@@ -0,0 +1,1061 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVIDIA Tegra410 C2C PMU driver.
+ *
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/property.h>
+
+/* The C2C interface types in Tegra410. */
+#define C2C_TYPE_NVLINK          0x0
+#define C2C_TYPE_NVCLINK         0x1
+#define C2C_TYPE_NVDLINK         0x2
+#define C2C_TYPE_COUNT           0x3
+
+/* The type of the peer device connected to the C2C interface. */
+#define C2C_PEER_TYPE_CPU        0x0
+#define C2C_PEER_TYPE_GPU        0x1
+#define C2C_PEER_TYPE_CXLMEM     0x2
+#define C2C_PEER_TYPE_COUNT      0x3
+
+/* The number of peer devices can be connected to the C2C interface. */
+#define C2C_NR_PEER_CPU          0x1
+#define C2C_NR_PEER_GPU          0x2
+#define C2C_NR_PEER_CXLMEM       0x1
+#define C2C_NR_PEER_MAX          0x2
+
+/* Number of instances on each interface. */
+#define C2C_NR_INST_NVLINK       14
+#define C2C_NR_INST_NVCLINK      12
+#define C2C_NR_INST_NVDLINK      16
+#define C2C_NR_INST_MAX          16
+
+/* Register offsets. */
+#define C2C_CTRL                    0x864
+#define C2C_IN_STATUS               0x868
+#define C2C_CYCLE_CNTR              0x86c
+#define C2C_IN_RD_CUM_OUTS_CNTR     0x874
+#define C2C_IN_RD_REQ_CNTR          0x87c
+#define C2C_IN_WR_CUM_OUTS_CNTR     0x884
+#define C2C_IN_WR_REQ_CNTR          0x88c
+#define C2C_OUT_STATUS              0x890
+#define C2C_OUT_RD_CUM_OUTS_CNTR    0x898
+#define C2C_OUT_RD_REQ_CNTR         0x8a0
+#define C2C_OUT_WR_CUM_OUTS_CNTR    0x8a8
+#define C2C_OUT_WR_REQ_CNTR         0x8b0
+
+/* C2C_IN_STATUS register field. */
+#define C2C_IN_STATUS_CYCLE_OVF             BIT(0)
+#define C2C_IN_STATUS_IN_RD_CUM_OUTS_OVF    BIT(1)
+#define C2C_IN_STATUS_IN_RD_REQ_OVF         BIT(2)
+#define C2C_IN_STATUS_IN_WR_CUM_OUTS_OVF    BIT(3)
+#define C2C_IN_STATUS_IN_WR_REQ_OVF         BIT(4)
+
+/* C2C_OUT_STATUS register field. */
+#define C2C_OUT_STATUS_OUT_RD_CUM_OUTS_OVF    BIT(0)
+#define C2C_OUT_STATUS_OUT_RD_REQ_OVF         BIT(1)
+#define C2C_OUT_STATUS_OUT_WR_CUM_OUTS_OVF    BIT(2)
+#define C2C_OUT_STATUS_OUT_WR_REQ_OVF         BIT(3)
+
+/* Events. */
+#define C2C_EVENT_CYCLES                0x0
+#define C2C_EVENT_IN_RD_CUM_OUTS        0x1
+#define C2C_EVENT_IN_RD_REQ             0x2
+#define C2C_EVENT_IN_WR_CUM_OUTS        0x3
+#define C2C_EVENT_IN_WR_REQ             0x4
+#define C2C_EVENT_OUT_RD_CUM_OUTS       0x5
+#define C2C_EVENT_OUT_RD_REQ            0x6
+#define C2C_EVENT_OUT_WR_CUM_OUTS       0x7
+#define C2C_EVENT_OUT_WR_REQ            0x8
+
+#define C2C_NUM_EVENTS           0x9
+#define C2C_MASK_EVENT           0xFF
+#define C2C_MAX_ACTIVE_EVENTS    32
+
+#define C2C_ACTIVE_CPU_MASK        0x0
+#define C2C_ASSOCIATED_CPU_MASK    0x1
+
+/*
+ * Maximum poll count for reading counter value using high-low-high sequence.
+ */
+#define HILOHI_MAX_POLL    1000
+
+static unsigned long nv_c2c_pmu_cpuhp_state;
+
+/* PMU descriptor. */
+
+/* Tracks the events assigned to the PMU for a given logical index. */
+struct nv_c2c_pmu_hw_events {
+	/* The events that are active. */
+	struct perf_event *events[C2C_MAX_ACTIVE_EVENTS];
+
+	/*
+	 * Each bit indicates a logical counter is being used (or not) for an
+	 * event.
+	 */
+	DECLARE_BITMAP(used_ctrs, C2C_MAX_ACTIVE_EVENTS);
+};
+
+struct nv_c2c_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	struct acpi_device *acpi_dev;
+
+	const char *name;
+	const char *identifier;
+
+	unsigned int c2c_type;
+	unsigned int peer_type;
+	unsigned int socket;
+	unsigned int nr_inst;
+	unsigned int nr_peer;
+	unsigned long peer_insts[C2C_NR_PEER_MAX][BITS_TO_LONGS(C2C_NR_INST_MAX)];
+	u32 filter_default;
+
+	struct nv_c2c_pmu_hw_events hw_events;
+
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+
+	struct hlist_node cpuhp_node;
+
+	struct attribute **formats;
+	const struct attribute_group *attr_groups[6];
+
+	void __iomem *base_broadcast;
+	void __iomem *base[C2C_NR_INST_MAX];
+};
+
+#define to_c2c_pmu(p) (container_of(p, struct nv_c2c_pmu, pmu))
+
+/* Get event type from perf_event. */
+static inline u32 get_event_type(struct perf_event *event)
+{
+	return (event->attr.config) & C2C_MASK_EVENT;
+}
+
+static inline u32 get_filter_mask(struct perf_event *event)
+{
+	u32 filter;
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
+
+	filter = ((u32)event->attr.config1) & c2c_pmu->filter_default;
+	if (filter == 0)
+		filter = c2c_pmu->filter_default;
+
+	return filter;
+}
+
+/* PMU operations. */
+
+static int nv_c2c_pmu_get_event_idx(struct nv_c2c_pmu_hw_events *hw_events,
+				    struct perf_event *event)
+{
+	u32 idx;
+
+	idx = find_first_zero_bit(hw_events->used_ctrs, C2C_MAX_ACTIVE_EVENTS);
+	if (idx >= C2C_MAX_ACTIVE_EVENTS)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool
+nv_c2c_pmu_validate_event(struct pmu *pmu,
+			  struct nv_c2c_pmu_hw_events *hw_events,
+			  struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+
+	return nv_c2c_pmu_get_event_idx(hw_events, event) >= 0;
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool nv_c2c_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct nv_c2c_pmu_hw_events fake_hw_events;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events,
+					       sibling))
+			return false;
+	}
+
+	return nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int nv_c2c_pmu_event_init(struct perf_event *event)
+{
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u32 event_type = get_event_type(event);
+
+	if (event->attr.type != event->pmu->type ||
+	    event_type >= C2C_NUM_EVENTS)
+		return -ENOENT;
+
+	/*
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(c2c_pmu->pmu.dev, "Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(c2c_pmu->pmu.dev, "Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &c2c_pmu->associated_cpus)) {
+		dev_dbg(c2c_pmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu = cpumask_first(&c2c_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!nv_c2c_pmu_validate_group(event))
+		return -EINVAL;
+
+	hwc->idx = -1;
+	hwc->config = event_type;
+
+	return 0;
+}
+
+/*
+ * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
+ */
+static u64 read_reg64_hilohi(const void __iomem *addr, u32 max_poll_count)
+{
+	u32 val_lo, val_hi;
+	u64 val;
+
+	/* Use high-low-high sequence to avoid tearing */
+	do {
+		if (max_poll_count-- == 0) {
+			pr_err("NV C2C PMU: timeout hi-low-high sequence\n");
+			return 0;
+		}
+
+		val_hi = readl(addr + 4);
+		val_lo = readl(addr);
+	} while (val_hi != readl(addr + 4));
+
+	val = (((u64)val_hi << 32) | val_lo);
+
+	return val;
+}
+
+static void nv_c2c_pmu_check_status(struct nv_c2c_pmu *c2c_pmu, u32 instance)
+{
+	u32 in_status, out_status;
+
+	in_status = readl(c2c_pmu->base[instance] + C2C_IN_STATUS);
+	out_status = readl(c2c_pmu->base[instance] + C2C_OUT_STATUS);
+
+	if (in_status || out_status)
+		dev_warn(c2c_pmu->dev,
+			"C2C PMU overflow in: 0x%x, out: 0x%x\n",
+			in_status, out_status);
+}
+
+static u32 nv_c2c_ctr_offset[C2C_NUM_EVENTS] = {
+	[C2C_EVENT_CYCLES] = C2C_CYCLE_CNTR,
+	[C2C_EVENT_IN_RD_CUM_OUTS] = C2C_IN_RD_CUM_OUTS_CNTR,
+	[C2C_EVENT_IN_RD_REQ] = C2C_IN_RD_REQ_CNTR,
+	[C2C_EVENT_IN_WR_CUM_OUTS] = C2C_IN_WR_CUM_OUTS_CNTR,
+	[C2C_EVENT_IN_WR_REQ] = C2C_IN_WR_REQ_CNTR,
+	[C2C_EVENT_OUT_RD_CUM_OUTS] = C2C_OUT_RD_CUM_OUTS_CNTR,
+	[C2C_EVENT_OUT_RD_REQ] = C2C_OUT_RD_REQ_CNTR,
+	[C2C_EVENT_OUT_WR_CUM_OUTS] = C2C_OUT_WR_CUM_OUTS_CNTR,
+	[C2C_EVENT_OUT_WR_REQ] = C2C_OUT_WR_REQ_CNTR,
+};
+
+static u64 nv_c2c_pmu_read_counter(struct perf_event *event)
+{
+	u32 ctr_id, ctr_offset, filter_mask, filter_idx, inst_idx;
+	unsigned long *inst_mask;
+	DECLARE_BITMAP(filter_bitmap, C2C_NR_PEER_MAX);
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
+	u64 val = 0;
+
+	filter_mask = get_filter_mask(event);
+	bitmap_from_arr32(filter_bitmap, &filter_mask, c2c_pmu->nr_peer);
+
+	ctr_id = event->hw.config;
+	ctr_offset = nv_c2c_ctr_offset[ctr_id];
+
+	for_each_set_bit(filter_idx, filter_bitmap, c2c_pmu->nr_peer) {
+		inst_mask = c2c_pmu->peer_insts[filter_idx];
+		for_each_set_bit(inst_idx, inst_mask, c2c_pmu->nr_inst) {
+			nv_c2c_pmu_check_status(c2c_pmu, inst_idx);
+
+			/*
+			 * Each instance share same clock and the driver always
+			 * enables all instances. So we can use the counts from
+			 * one instance for cycle counter.
+			 */
+			if (ctr_id == C2C_EVENT_CYCLES)
+				return read_reg64_hilohi(
+					c2c_pmu->base[inst_idx] + ctr_offset,
+					HILOHI_MAX_POLL);
+
+			/*
+			 * For other events, sum up the counts from all instances.
+			 */
+			val += read_reg64_hilohi(
+				c2c_pmu->base[inst_idx] + ctr_offset,
+				HILOHI_MAX_POLL);
+		}
+	}
+
+	return val;
+}
+
+static void nv_c2c_pmu_event_update(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 prev, now;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+		now = nv_c2c_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	local64_add(now - prev, &event->count);
+}
+
+static void nv_c2c_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	event->hw.state = 0;
+}
+
+static void nv_c2c_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	event->hw.state |= PERF_HES_STOPPED;
+}
+
+static int nv_c2c_pmu_add(struct perf_event *event, int flags)
+{
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
+	struct nv_c2c_pmu_hw_events *hw_events = &c2c_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &c2c_pmu->associated_cpus)))
+		return -ENOENT;
+
+	idx = nv_c2c_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] = event;
+	hwc->idx = idx;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		nv_c2c_pmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void nv_c2c_pmu_del(struct perf_event *event, int flags)
+{
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
+	struct nv_c2c_pmu_hw_events *hw_events = &c2c_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	nv_c2c_pmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] = NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void nv_c2c_pmu_read(struct perf_event *event)
+{
+	nv_c2c_pmu_event_update(event);
+}
+
+static void nv_c2c_pmu_enable(struct pmu *pmu)
+{
+	void __iomem *bcast;
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(pmu);
+
+	/* Check if any filter is enabled. */
+	if (bitmap_empty(c2c_pmu->hw_events.used_ctrs, C2C_MAX_ACTIVE_EVENTS))
+		return;
+
+	/* Enable all the counters. */
+	bcast = c2c_pmu->base_broadcast;
+	writel(0x1UL, bcast + C2C_CTRL);
+}
+
+static void nv_c2c_pmu_disable(struct pmu *pmu)
+{
+	unsigned int idx;
+	void __iomem *bcast;
+	struct perf_event *event;
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(pmu);
+
+	/* Disable all the counters. */
+	bcast = c2c_pmu->base_broadcast;
+	writel(0x0UL, bcast + C2C_CTRL);
+
+	/*
+	 * The counters will start from 0 again on restart.
+	 * Update the events immediately to avoid losing the counts.
+	 */
+	for_each_set_bit(idx, c2c_pmu->hw_events.used_ctrs,
+			 C2C_MAX_ACTIVE_EVENTS) {
+		event = c2c_pmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		nv_c2c_pmu_event_update(event);
+
+		local64_set(&event->hw.prev_count, 0ULL);
+	}
+}
+
+/* PMU identifier attribute. */
+
+static ssize_t nv_c2c_pmu_identifier_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *page)
+{
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", c2c_pmu->identifier);
+}
+
+static struct device_attribute nv_c2c_pmu_identifier_attr =
+	__ATTR(identifier, 0444, nv_c2c_pmu_identifier_show, NULL);
+
+static struct attribute *nv_c2c_pmu_identifier_attrs[] = {
+	&nv_c2c_pmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group nv_c2c_pmu_identifier_attr_group = {
+	.attrs = nv_c2c_pmu_identifier_attrs,
+};
+
+/* Peer attribute. */
+
+static ssize_t nv_c2c_pmu_peer_show(struct device *dev,
+	struct device_attribute *attr,
+	char *page)
+{
+	const char *peer_type[C2C_PEER_TYPE_COUNT] = {
+		[C2C_PEER_TYPE_CPU] = "cpu",
+		[C2C_PEER_TYPE_GPU] = "gpu",
+		[C2C_PEER_TYPE_CXLMEM] = "cxlmem",
+	};
+
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(dev_get_drvdata(dev));
+	return sysfs_emit(page, "nr_%s=%u\n", peer_type[c2c_pmu->peer_type],
+		c2c_pmu->nr_peer);
+}
+
+static struct device_attribute nv_c2c_pmu_peer_attr =
+	__ATTR(peer, 0444, nv_c2c_pmu_peer_show, NULL);
+
+static struct attribute *nv_c2c_pmu_peer_attrs[] = {
+	&nv_c2c_pmu_peer_attr.attr,
+	NULL,
+};
+
+static struct attribute_group nv_c2c_pmu_peer_attr_group = {
+	.attrs = nv_c2c_pmu_peer_attrs,
+};
+
+/* Format attributes. */
+
+#define NV_C2C_PMU_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define NV_C2C_PMU_FORMAT_ATTR(_name, _config) \
+	NV_C2C_PMU_EXT_ATTR(_name, device_show_string, _config)
+
+#define NV_C2C_PMU_FORMAT_EVENT_ATTR \
+	NV_C2C_PMU_FORMAT_ATTR(event, "config:0-3")
+
+static struct attribute *nv_c2c_nvlink_pmu_formats[] = {
+	NV_C2C_PMU_FORMAT_EVENT_ATTR,
+	NV_C2C_PMU_FORMAT_ATTR(gpu_mask, "config1:0-1"),
+	NULL,
+};
+
+static struct attribute *nv_c2c_pmu_formats[] = {
+	NV_C2C_PMU_FORMAT_EVENT_ATTR,
+	NULL,
+};
+
+static struct attribute_group *
+nv_c2c_pmu_alloc_format_attr_group(struct nv_c2c_pmu *c2c_pmu)
+{
+	struct attribute_group *format_group;
+	struct device *dev = c2c_pmu->dev;
+
+	format_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!format_group)
+		return NULL;
+
+	format_group->name = "format";
+	format_group->attrs = c2c_pmu->formats;
+
+	return format_group;
+}
+
+/* Event attributes. */
+
+static ssize_t nv_c2c_pmu_sysfs_event_show(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct perf_pmu_events_attr *pmu_attr;
+
+	pmu_attr = container_of(attr, typeof(*pmu_attr), attr);
+	return sysfs_emit(buf, "event=0x%llx\n", pmu_attr->id);
+}
+
+#define NV_C2C_PMU_EVENT_ATTR(_name, _config)	\
+	PMU_EVENT_ATTR_ID(_name, nv_c2c_pmu_sysfs_event_show, _config)
+
+static struct attribute *nv_c2c_pmu_events[] = {
+	NV_C2C_PMU_EVENT_ATTR(cycles, C2C_EVENT_CYCLES),
+	NV_C2C_PMU_EVENT_ATTR(in_rd_cum_outs, C2C_EVENT_IN_RD_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(in_rd_req, C2C_EVENT_IN_RD_REQ),
+	NV_C2C_PMU_EVENT_ATTR(in_wr_cum_outs, C2C_EVENT_IN_WR_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(in_wr_req, C2C_EVENT_IN_WR_REQ),
+	NV_C2C_PMU_EVENT_ATTR(out_rd_cum_outs, C2C_EVENT_OUT_RD_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(out_rd_req, C2C_EVENT_OUT_RD_REQ),
+	NV_C2C_PMU_EVENT_ATTR(out_wr_cum_outs, C2C_EVENT_OUT_WR_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(out_wr_req, C2C_EVENT_OUT_WR_REQ),
+	NULL
+};
+
+static umode_t
+nv_c2c_pmu_event_attr_is_visible(struct kobject *kobj, struct attribute *attr,
+				 int unused)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(dev_get_drvdata(dev));
+	struct perf_pmu_events_attr *eattr;
+
+	eattr = container_of(attr, typeof(*eattr), attr.attr);
+
+	if (c2c_pmu->c2c_type == C2C_TYPE_NVDLINK) {
+		/* Only incoming reads are available. */
+		switch (eattr->id) {
+		case C2C_EVENT_IN_WR_CUM_OUTS:
+		case C2C_EVENT_IN_WR_REQ:
+		case C2C_EVENT_OUT_RD_CUM_OUTS:
+		case C2C_EVENT_OUT_RD_REQ:
+		case C2C_EVENT_OUT_WR_CUM_OUTS:
+		case C2C_EVENT_OUT_WR_REQ:
+			return 0;
+		default:
+			return attr->mode;
+		}
+	} else {
+		/* Hide the write events if C2C connected to another SoC. */
+		if (c2c_pmu->peer_type == C2C_PEER_TYPE_CPU) {
+			switch (eattr->id) {
+			case C2C_EVENT_IN_WR_CUM_OUTS:
+			case C2C_EVENT_IN_WR_REQ:
+			case C2C_EVENT_OUT_WR_CUM_OUTS:
+			case C2C_EVENT_OUT_WR_REQ:
+				return 0;
+			default:
+				return attr->mode;
+			}
+		}
+	}
+
+	return attr->mode;
+}
+
+static const struct attribute_group nv_c2c_pmu_events_group = {
+	.name = "events",
+	.attrs = nv_c2c_pmu_events,
+	.is_visible = nv_c2c_pmu_event_attr_is_visible,
+};
+
+/* Cpumask attributes. */
+
+static ssize_t nv_c2c_pmu_cpumask_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(pmu);
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case C2C_ACTIVE_CPU_MASK:
+		cpumask = &c2c_pmu->active_cpu;
+		break;
+	case C2C_ASSOCIATED_CPU_MASK:
+		cpumask = &c2c_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+#define NV_C2C_PMU_CPUMASK_ATTR(_name, _config)			\
+	NV_C2C_PMU_EXT_ATTR(_name, nv_c2c_pmu_cpumask_show,	\
+				(unsigned long)_config)
+
+static struct attribute *nv_c2c_pmu_cpumask_attrs[] = {
+	NV_C2C_PMU_CPUMASK_ATTR(cpumask, C2C_ACTIVE_CPU_MASK),
+	NV_C2C_PMU_CPUMASK_ATTR(associated_cpus, C2C_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static const struct attribute_group nv_c2c_pmu_cpumask_attr_group = {
+	.attrs = nv_c2c_pmu_cpumask_attrs,
+};
+
+/* Per PMU device attribute groups. */
+
+static int nv_c2c_pmu_alloc_attr_groups(struct nv_c2c_pmu *c2c_pmu)
+{
+	const struct attribute_group **attr_groups = c2c_pmu->attr_groups;
+
+	attr_groups[0] = nv_c2c_pmu_alloc_format_attr_group(c2c_pmu);
+	attr_groups[1] = &nv_c2c_pmu_events_group;
+	attr_groups[2] = &nv_c2c_pmu_cpumask_attr_group;
+	attr_groups[3] = &nv_c2c_pmu_identifier_attr_group;
+	attr_groups[4] = &nv_c2c_pmu_peer_attr_group;
+
+	if (!attr_groups[0])
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int nv_c2c_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
+{
+	struct nv_c2c_pmu *c2c_pmu =
+		hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node);
+
+	if (!cpumask_test_cpu(cpu, &c2c_pmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&c2c_pmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	cpumask_set_cpu(cpu, &c2c_pmu->active_cpu);
+
+	return 0;
+}
+
+static int nv_c2c_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	unsigned int dst;
+
+	struct nv_c2c_pmu *c2c_pmu =
+		hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &c2c_pmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	dst = cpumask_any_and_but(&c2c_pmu->associated_cpus,
+				  cpu_online_mask, cpu);
+	if (dst >= nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&c2c_pmu->pmu, cpu, dst);
+	cpumask_set_cpu(dst, &c2c_pmu->active_cpu);
+
+	return 0;
+}
+
+static int nv_c2c_pmu_get_cpus(struct nv_c2c_pmu *c2c_pmu)
+{
+	int ret = 0, socket = c2c_pmu->socket, cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (cpu_to_node(cpu) == socket)
+			cpumask_set_cpu(cpu, &c2c_pmu->associated_cpus);
+	}
+
+	if (cpumask_empty(&c2c_pmu->associated_cpus)) {
+		dev_dbg(c2c_pmu->dev,
+			"No cpu associated with C2C PMU socket-%u\n", socket);
+		ret = -ENODEV;
+	}
+
+	return ret;
+}
+
+static int nv_c2c_pmu_init_socket(struct nv_c2c_pmu *c2c_pmu)
+{
+	const char *uid_str;
+	int ret, socket;
+
+	uid_str = acpi_device_uid(c2c_pmu->acpi_dev);
+	if (!uid_str) {
+		ret = -ENODEV;
+		goto fail;
+	}
+
+	ret = kstrtou32(uid_str, 0, &socket);
+	if (ret)
+		goto fail;
+
+	c2c_pmu->socket = socket;
+	return 0;
+
+fail:
+	dev_err(c2c_pmu->dev, "Failed to initialize socket\n");
+	return ret;
+}
+
+static int nv_c2c_pmu_init_id(struct nv_c2c_pmu *c2c_pmu)
+{
+	const char *name_fmt[C2C_TYPE_COUNT] = {
+		[C2C_TYPE_NVLINK] = "nvidia_nvlink_c2c_pmu_%u",
+		[C2C_TYPE_NVCLINK] = "nvidia_nvclink_pmu_%u",
+		[C2C_TYPE_NVDLINK] = "nvidia_nvdlink_pmu_%u",
+	};
+
+	char *name;
+	int ret;
+
+	name = devm_kasprintf(c2c_pmu->dev, GFP_KERNEL,
+		name_fmt[c2c_pmu->c2c_type], c2c_pmu->socket);
+	if (!name) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	c2c_pmu->name = name;
+
+	c2c_pmu->identifier = acpi_device_hid(c2c_pmu->acpi_dev);
+
+	return 0;
+
+fail:
+	dev_err(c2c_pmu->dev, "Failed to initialize name\n");
+	return ret;
+}
+
+static int nv_c2c_pmu_init_filter(struct nv_c2c_pmu *c2c_pmu)
+{
+	u32 cpu_en = 0;
+	struct device *dev = c2c_pmu->dev;
+
+	if (c2c_pmu->c2c_type == C2C_TYPE_NVDLINK) {
+		c2c_pmu->peer_type = C2C_PEER_TYPE_CXLMEM;
+
+		c2c_pmu->nr_inst = C2C_NR_INST_NVDLINK;
+		c2c_pmu->peer_insts[0][0] = (1UL << c2c_pmu->nr_inst) - 1;
+
+		c2c_pmu->nr_peer = C2C_NR_PEER_CXLMEM;
+		c2c_pmu->filter_default = (1 << c2c_pmu->nr_peer) - 1;
+
+		c2c_pmu->formats = nv_c2c_pmu_formats;
+
+		return 0;
+	}
+
+	c2c_pmu->nr_inst = (c2c_pmu->c2c_type == C2C_TYPE_NVLINK) ?
+		C2C_NR_INST_NVLINK : C2C_NR_INST_NVCLINK;
+
+	if (device_property_read_u32(dev, "cpu_en_mask", &cpu_en))
+		dev_dbg(dev, "no cpu_en_mask property\n");
+
+	if (cpu_en) {
+		c2c_pmu->peer_type = C2C_PEER_TYPE_CPU;
+
+		/* Fill peer_insts bitmap with instances connected to peer CPU. */
+		bitmap_from_arr32(c2c_pmu->peer_insts[0], &cpu_en,
+				c2c_pmu->nr_inst);
+
+		c2c_pmu->nr_peer = 1;
+		c2c_pmu->formats = nv_c2c_pmu_formats;
+	} else {
+		u32 i;
+		u32 gpu_en = 0;
+		const char *props[C2C_NR_PEER_MAX] = {
+			"gpu0_en_mask", "gpu1_en_mask"
+		};
+
+		for (i = 0; i < C2C_NR_PEER_MAX; i++) {
+			if (device_property_read_u32(dev, props[i], &gpu_en))
+				dev_dbg(dev, "no %s property\n", props[i]);
+
+			if (gpu_en) {
+				/* Fill peer_insts bitmap with instances connected to peer GPU. */
+				bitmap_from_arr32(c2c_pmu->peer_insts[i], &gpu_en,
+						c2c_pmu->nr_inst);
+
+				c2c_pmu->nr_peer++;
+			}
+		}
+
+		if (c2c_pmu->nr_peer == 0) {
+			dev_err(dev, "No GPU is enabled\n");
+			return -EINVAL;
+		}
+
+		c2c_pmu->peer_type = C2C_PEER_TYPE_GPU;
+		c2c_pmu->formats = nv_c2c_nvlink_pmu_formats;
+	}
+
+	c2c_pmu->filter_default = (1 << c2c_pmu->nr_peer) - 1;
+
+	return 0;
+}
+
+static void *nv_c2c_pmu_init_pmu(struct platform_device *pdev)
+{
+	int ret;
+	struct nv_c2c_pmu *c2c_pmu;
+	struct acpi_device *acpi_dev;
+	struct device *dev = &pdev->dev;
+
+	acpi_dev = ACPI_COMPANION(dev);
+	if (!acpi_dev)
+		return ERR_PTR(-ENODEV);
+
+	c2c_pmu = devm_kzalloc(dev, sizeof(*c2c_pmu), GFP_KERNEL);
+	if (!c2c_pmu)
+		return ERR_PTR(-ENOMEM);
+
+	c2c_pmu->dev = dev;
+	c2c_pmu->acpi_dev = acpi_dev;
+	c2c_pmu->c2c_type = (unsigned int)(unsigned long)device_get_match_data(dev);
+	platform_set_drvdata(pdev, c2c_pmu);
+
+	ret = nv_c2c_pmu_init_socket(c2c_pmu);
+	if (ret)
+		goto done;
+
+	ret = nv_c2c_pmu_init_id(c2c_pmu);
+	if (ret)
+		goto done;
+
+	ret = nv_c2c_pmu_init_filter(c2c_pmu);
+	if (ret)
+		goto done;
+
+done:
+	if (ret)
+		return ERR_PTR(ret);
+
+	return c2c_pmu;
+}
+
+static int nv_c2c_pmu_init_mmio(struct nv_c2c_pmu *c2c_pmu)
+{
+	int i;
+	struct device *dev = c2c_pmu->dev;
+	struct platform_device *pdev = to_platform_device(dev);
+
+	/* Map the address of all the instances. */
+	for (i = 0; i < c2c_pmu->nr_inst; i++) {
+		c2c_pmu->base[i] = devm_platform_ioremap_resource(pdev, i);
+		if (IS_ERR(c2c_pmu->base[i])) {
+			dev_err(dev, "Failed map address for instance %d\n", i);
+			return PTR_ERR(c2c_pmu->base[i]);
+		}
+	}
+
+	/* Map broadcast address. */
+	c2c_pmu->base_broadcast = devm_platform_ioremap_resource(pdev,
+								 c2c_pmu->nr_inst);
+	if (IS_ERR(c2c_pmu->base_broadcast)) {
+		dev_err(dev, "Failed map broadcast address\n");
+		return PTR_ERR(c2c_pmu->base_broadcast);
+	}
+
+	return 0;
+}
+
+static int nv_c2c_pmu_register_pmu(struct nv_c2c_pmu *c2c_pmu)
+{
+	int ret;
+
+	ret = cpuhp_state_add_instance(nv_c2c_pmu_cpuhp_state,
+				       &c2c_pmu->cpuhp_node);
+	if (ret) {
+		dev_err(c2c_pmu->dev, "Error %d registering hotplug\n", ret);
+		return ret;
+	}
+
+	c2c_pmu->pmu = (struct pmu) {
+		.parent		= c2c_pmu->dev,
+		.task_ctx_nr	= perf_invalid_context,
+		.pmu_enable	= nv_c2c_pmu_enable,
+		.pmu_disable	= nv_c2c_pmu_disable,
+		.event_init	= nv_c2c_pmu_event_init,
+		.add		= nv_c2c_pmu_add,
+		.del		= nv_c2c_pmu_del,
+		.start		= nv_c2c_pmu_start,
+		.stop		= nv_c2c_pmu_stop,
+		.read		= nv_c2c_pmu_read,
+		.attr_groups	= c2c_pmu->attr_groups,
+		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE |
+					PERF_PMU_CAP_NO_INTERRUPT,
+	};
+
+	ret = perf_pmu_register(&c2c_pmu->pmu, c2c_pmu->name, -1);
+	if (ret) {
+		dev_err(c2c_pmu->dev, "Failed to register C2C PMU: %d\n", ret);
+		cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state,
+					  &c2c_pmu->cpuhp_node);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int nv_c2c_pmu_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct nv_c2c_pmu *c2c_pmu;
+
+	c2c_pmu = nv_c2c_pmu_init_pmu(pdev);
+	if (IS_ERR(c2c_pmu))
+		return PTR_ERR(c2c_pmu);
+
+	ret = nv_c2c_pmu_init_mmio(c2c_pmu);
+	if (ret)
+		return ret;
+
+	ret = nv_c2c_pmu_get_cpus(c2c_pmu);
+	if (ret)
+		return ret;
+
+	ret = nv_c2c_pmu_alloc_attr_groups(c2c_pmu);
+	if (ret)
+		return ret;
+
+	ret = nv_c2c_pmu_register_pmu(c2c_pmu);
+	if (ret)
+		return ret;
+
+	dev_dbg(c2c_pmu->dev, "Registered %s PMU\n", c2c_pmu->name);
+
+	return 0;
+}
+
+static void nv_c2c_pmu_device_remove(struct platform_device *pdev)
+{
+	struct nv_c2c_pmu *c2c_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&c2c_pmu->pmu);
+	cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state, &c2c_pmu->cpuhp_node);
+}
+
+static const struct acpi_device_id nv_c2c_pmu_acpi_match[] = {
+	{ "NVDA2023", (kernel_ulong_t)C2C_TYPE_NVLINK },
+	{ "NVDA2022", (kernel_ulong_t)C2C_TYPE_NVCLINK },
+	{ "NVDA2020", (kernel_ulong_t)C2C_TYPE_NVDLINK },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, nv_c2c_pmu_acpi_match);
+
+static struct platform_driver nv_c2c_pmu_driver = {
+	.driver = {
+		.name = "nvidia-t410-c2c-pmu",
+		.acpi_match_table = ACPI_PTR(nv_c2c_pmu_acpi_match),
+		.suppress_bind_attrs = true,
+	},
+	.probe = nv_c2c_pmu_probe,
+	.remove = nv_c2c_pmu_device_remove,
+};
+
+static int __init nv_c2c_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				      "perf/nvidia/c2c:online",
+				      nv_c2c_pmu_online_cpu,
+				      nv_c2c_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+
+	nv_c2c_pmu_cpuhp_state = ret;
+	return platform_driver_register(&nv_c2c_pmu_driver);
+}
+
+static void __exit nv_c2c_pmu_exit(void)
+{
+	platform_driver_unregister(&nv_c2c_pmu_driver);
+	cpuhp_remove_multi_state(nv_c2c_pmu_cpuhp_state);
+}
+
+module_init(nv_c2c_pmu_init);
+module_exit(nv_c2c_pmu_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NVIDIA Tegra410 C2C PMU driver");
+MODULE_AUTHOR("Besar Wicaksono <bwicaksono@nvidia.com>");
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
                   ` (6 preceding siblings ...)
  2026-01-26 18:11 ` [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU Besar Wicaksono
@ 2026-01-26 18:11 ` Besar Wicaksono
  2026-02-08 11:18   ` Krzysztof Kozlowski
  7 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-01-26 18:11 UTC (permalink / raw)
  To: will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd, Besar Wicaksono

Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 arch/arm64/configs/defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 45288ec9eaf7..3d0e438cb997 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1723,6 +1723,8 @@ CONFIG_ARM_DMC620_PMU=m
 CONFIG_HISI_PMU=y
 CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU=m
 CONFIG_NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU=m
+CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU=m
+CONFIG_NVIDIA_TEGRA410_C2C_PMU=m
 CONFIG_MESON_DDR_PMU=m
 CONFIG_NVMEM_LAYOUT_SL28_VPD=m
 CONFIG_NVMEM_IMX_OCOTP=y
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU
  2026-01-26 18:11 ` [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU Besar Wicaksono
@ 2026-01-28  0:40   ` kernel test robot
  2026-01-30  2:09   ` Ilkka Koskinen
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2026-01-28  0:40 UTC (permalink / raw)
  To: Besar Wicaksono, will, suzuki.poulose, robin.murphy, ilkka
  Cc: oe-kbuild-all, linux-arm-kernel, linux-kernel, linux-tegra,
	mark.rutland, treding, jonathanh, vsethi, rwiley, sdonthineni,
	skelley, ywan, mochs, nirmoyd, Besar Wicaksono

Hi Besar,

kernel test robot noticed the following build errors:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on linus/master v6.19-rc7 next-20260127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Besar-Wicaksono/perf-arm_cspmu-nvidia-Rename-doc-to-Tegra241/20260127-021604
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link:    https://lore.kernel.org/r/20260126181155.2776097-7-bwicaksono%40nvidia.com
patch subject: [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU
config: arm64-randconfig-r113-20260128 (https://download.01.org/0day-ci/archive/20260128/202601280830.2IJaaITg-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9b8addffa70cee5b2acc5454712d9cf78ce45710)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260128/202601280830.2IJaaITg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601280830.2IJaaITg-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/perf/nvidia_t410_cmem_latency_pmu.c:604:12: error: call to undeclared function 'acpi_device_uid'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     604 |         uid_str = acpi_device_uid(acpi_dev);
         |                   ^
   drivers/perf/nvidia_t410_cmem_latency_pmu.c:604:12: note: did you mean 'cpu_device_up'?
   include/linux/cpu.h:119:5: note: 'cpu_device_up' declared here
     119 | int cpu_device_up(struct device *dev);
         |     ^
>> drivers/perf/nvidia_t410_cmem_latency_pmu.c:604:10: error: incompatible integer to pointer conversion assigning to 'char *' from 'int' [-Wint-conversion]
     604 |         uid_str = acpi_device_uid(acpi_dev);
         |                 ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/perf/nvidia_t410_cmem_latency_pmu.c:619:29: error: call to undeclared function 'acpi_device_hid'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     619 |         cmem_lat_pmu->identifier = acpi_device_hid(acpi_dev);
         |                                    ^
>> drivers/perf/nvidia_t410_cmem_latency_pmu.c:619:27: error: incompatible integer to pointer conversion assigning to 'const char *' from 'int' [-Wint-conversion]
     619 |         cmem_lat_pmu->identifier = acpi_device_hid(acpi_dev);
         |                                  ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
   4 errors generated.


vim +/acpi_device_uid +604 drivers/perf/nvidia_t410_cmem_latency_pmu.c

   590	
   591	static int cmem_lat_pmu_probe(struct platform_device *pdev)
   592	{
   593		struct device *dev = &pdev->dev;
   594		struct acpi_device *acpi_dev;
   595		struct cmem_lat_pmu *cmem_lat_pmu;
   596		char *name, *uid_str;
   597		int ret, i;
   598		u32 socket;
   599	
   600		acpi_dev = ACPI_COMPANION(dev);
   601		if (!acpi_dev)
   602			return -ENODEV;
   603	
 > 604		uid_str = acpi_device_uid(acpi_dev);
   605		if (!uid_str)
   606			return -ENODEV;
   607	
   608		ret = kstrtou32(uid_str, 0, &socket);
   609		if (ret)
   610			return ret;
   611	
   612		cmem_lat_pmu = devm_kzalloc(dev, sizeof(*cmem_lat_pmu), GFP_KERNEL);
   613		name = devm_kasprintf(dev, GFP_KERNEL, "nvidia_cmem_latency_pmu_%u", socket);
   614		if (!cmem_lat_pmu || !name)
   615			return -ENOMEM;
   616	
   617		cmem_lat_pmu->dev = dev;
   618		cmem_lat_pmu->name = name;
 > 619		cmem_lat_pmu->identifier = acpi_device_hid(acpi_dev);
   620		platform_set_drvdata(pdev, cmem_lat_pmu);
   621	
   622		cmem_lat_pmu->pmu = (struct pmu) {
   623			.parent		= &pdev->dev,
   624			.task_ctx_nr	= perf_invalid_context,
   625			.pmu_enable	= cmem_lat_pmu_enable,
   626			.pmu_disable	= cmem_lat_pmu_disable,
   627			.event_init	= cmem_lat_pmu_event_init,
   628			.add		= cmem_lat_pmu_add,
   629			.del		= cmem_lat_pmu_del,
   630			.start		= cmem_lat_pmu_start,
   631			.stop		= cmem_lat_pmu_stop,
   632			.read		= cmem_lat_pmu_read,
   633			.attr_groups	= cmem_lat_pmu_attr_groups,
   634			.capabilities	= PERF_PMU_CAP_NO_EXCLUDE |
   635						PERF_PMU_CAP_NO_INTERRUPT,
   636		};
   637	
   638		/* Map the address of all the instances plus one for the broadcast. */
   639		for (i = 0; i < NUM_INSTANCES + 1; i++) {
   640			cmem_lat_pmu->base[i] = devm_platform_ioremap_resource(pdev, i);
   641			if (IS_ERR(cmem_lat_pmu->base[i])) {
   642				dev_err(dev, "Failed map address for instance %d\n", i);
   643				return PTR_ERR(cmem_lat_pmu->base[i]);
   644			}
   645		}
   646	
   647		ret = cmem_lat_pmu_get_cpus(cmem_lat_pmu, socket);
   648		if (ret)
   649			return ret;
   650	
   651		ret = cpuhp_state_add_instance(cmem_lat_pmu_cpuhp_state,
   652					       &cmem_lat_pmu->node);
   653		if (ret) {
   654			dev_err(&pdev->dev, "Error %d registering hotplug\n", ret);
   655			return ret;
   656		}
   657	
   658		cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_ENABLE);
   659		cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_CLR);
   660		cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_DISABLE);
   661	
   662		ret = perf_pmu_register(&cmem_lat_pmu->pmu, name, -1);
   663		if (ret) {
   664			dev_err(&pdev->dev, "Failed to register PMU: %d\n", ret);
   665			cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state,
   666						    &cmem_lat_pmu->node);
   667			return ret;
   668		}
   669	
   670		dev_dbg(&pdev->dev, "Registered %s PMU\n", name);
   671	
   672		return 0;
   673	}
   674	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
  2026-01-26 18:11 ` [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Besar Wicaksono
@ 2026-01-28 15:56   ` kernel test robot
  2026-01-29 22:34   ` Ilkka Koskinen
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2026-01-28 15:56 UTC (permalink / raw)
  To: Besar Wicaksono, will, suzuki.poulose, robin.murphy, ilkka
  Cc: oe-kbuild-all, linux-arm-kernel, linux-kernel, linux-tegra,
	mark.rutland, treding, jonathanh, vsethi, rwiley, sdonthineni,
	skelley, ywan, mochs, nirmoyd, Besar Wicaksono

Hi Besar,

kernel test robot noticed the following build errors:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on linus/master v6.19-rc7 next-20260127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Besar-Wicaksono/perf-arm_cspmu-nvidia-Rename-doc-to-Tegra241/20260127-021604
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link:    https://lore.kernel.org/r/20260126181155.2776097-5-bwicaksono%40nvidia.com
patch subject: [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
config: loongarch-randconfig-r051-20260127 (https://download.01.org/0day-ci/archive/20260128/202601282328.gksG6ks9-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260128/202601282328.gksG6ks9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601282328.gksG6ks9-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "arm_cspmu_acpi_dev_get" [drivers/perf/arm_cspmu/nvidia_cspmu.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
  2026-01-26 18:11 ` [PATCH 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU Besar Wicaksono
@ 2026-01-29 22:20   ` Ilkka Koskinen
  0 siblings, 0 replies; 26+ messages in thread
From: Ilkka Koskinen @ 2026-01-29 22:20 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: will, suzuki.poulose, robin.murphy, ilkka, linux-arm-kernel,
	linux-kernel, linux-tegra, mark.rutland, treding, jonathanh,
	vsethi, rwiley, sdonthineni, skelley, ywan, mochs, nirmoyd


Hi Besar,

On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> Adds Unified Coherent Fabric PMU support in Tegra410 SOC.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>

Looks good to me

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

--Ilkka


> ---
> Documentation/admin-guide/perf/index.rst      |   1 +
> .../admin-guide/perf/nvidia-tegra410-pmu.rst  | 106 ++++++++++++++++++
> drivers/perf/arm_cspmu/nvidia_cspmu.c         |  90 ++++++++++++++-
> 3 files changed, 196 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
>
> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
> index c407bb44b08e..aa12708ddb96 100644
> --- a/Documentation/admin-guide/perf/index.rst
> +++ b/Documentation/admin-guide/perf/index.rst
> @@ -25,6 +25,7 @@ Performance monitor support
>    alibaba_pmu
>    dwc_pcie_pmu
>    nvidia-tegra241-pmu
> +   nvidia-tegra410-pmu
>    meson-ddr-pmu
>    cxl
>    ampere_cspmu
> diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> new file mode 100644
> index 000000000000..7b7ba5700ca1
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> @@ -0,0 +1,106 @@
> +=====================================================================
> +NVIDIA Tegra410 SoC Uncore Performance Monitoring Unit (PMU)
> +=====================================================================
> +
> +The NVIDIA Tegra410 SoC includes various system PMUs to measure key performance
> +metrics like memory bandwidth, latency, and utilization:
> +
> +* Unified Coherence Fabric (UCF)
> +
> +PMU Driver
> +----------
> +
> +The PMU driver describes the available events and configuration of each PMU in
> +sysfs. Please see the sections below to get the sysfs path of each PMU. Like
> +other uncore PMU drivers, the driver provides "cpumask" sysfs attribute to show
> +the CPU id used to handle the PMU event. There is also "associated_cpus"
> +sysfs attribute, which contains a list of CPUs associated with the PMU instance.
> +
> +UCF PMU
> +-------
> +
> +The Unified Coherence Fabric (UCF) in the NVIDIA Tegra410 SoC serves as a
> +distributed cache, last level for CPU Memory and CXL Memory, and cache coherent
> +interconnect that supports hardware coherence across multiple coherently caching
> +agents, including:
> +
> +  * CPU clusters
> +  * GPU
> +  * PCIe Ordering Controller Unit (OCU)
> +  * Other IO-coherent requesters
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_source/devices/nvidia_ucf_pmu_<socket-id>.
> +
> +Some of the events available in this PMU can be used to measure bandwidth and
> +utilization:
> +
> +  * slc_access_rd: count the number of read requests to SLC.
> +  * slc_access_wr: count the number of write requests to SLC.
> +  * slc_bytes_rd: count the number of bytes transferred by slc_access_rd.
> +  * slc_bytes_wr: count the number of bytes transferred by slc_access_wr.
> +  * mem_access_rd: count the number of read requests to local or remote memory.
> +  * mem_access_wr: count the number of write requests to local or remote memory.
> +  * mem_bytes_rd: count the number of bytes transferred by mem_access_rd.
> +  * mem_bytes_wr: count the number of bytes transferred by mem_access_wr.
> +  * cycles: counts the UCF cycles.
> +
> +The average bandwidth is calculated as::
> +
> +   AVG_SLC_READ_BANDWIDTH_IN_GBPS = SLC_BYTES_RD / ELAPSED_TIME_IN_NS
> +   AVG_SLC_WRITE_BANDWIDTH_IN_GBPS = SLC_BYTES_WR / ELAPSED_TIME_IN_NS
> +   AVG_MEM_READ_BANDWIDTH_IN_GBPS = MEM_BYTES_RD / ELAPSED_TIME_IN_NS
> +   AVG_MEM_WRITE_BANDWIDTH_IN_GBPS = MEM_BYTES_WR / ELAPSED_TIME_IN_NS
> +
> +The average request rate is calculated as::
> +
> +   AVG_SLC_READ_REQUEST_RATE = SLC_ACCESS_RD / CYCLES
> +   AVG_SLC_WRITE_REQUEST_RATE = SLC_ACCESS_WR / CYCLES
> +   AVG_MEM_READ_REQUEST_RATE = MEM_ACCESS_RD / CYCLES
> +   AVG_MEM_WRITE_REQUEST_RATE = MEM_ACCESS_WR / CYCLES
> +
> +More details about what other events are available can be found in Tegra410 SoC
> +technical reference manual.
> +
> +The events can be filtered based on source or destination. The source filter
> +indicates the traffic initiator to the SLC, e.g local CPU, non-CPU device, or
> +remote socket. The destination filter specifies the destination memory type,
> +e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory. The
> +local/remote classification of the destination filter is based on the home
> +socket of the address, not where the data actually resides. The available
> +filters are described in
> +/sys/bus/event_source/devices/nvidia_ucf_pmu_<socket-id>/format/.
> +
> +The list of UCF PMU event filters:
> +
> +* Source filter:
> +
> +  * src_loc_cpu: if set, count events from local CPU
> +  * src_loc_noncpu: if set, count events from local non-CPU device
> +  * src_rem: if set, count events from CPU, GPU, PCIE devices of remote socket
> +
> +* Destination filter:
> +
> +  * dst_loc_cmem: if set, count events to local system memory (CMEM) address
> +  * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address
> +  * dst_loc_other: if set, count events to local CXL memory address
> +  * dst_rem: if set, count events to CPU, GPU, and CXL memory address of remote socket
> +
> +If the source is not specified, the PMU will count events from all sources. If
> +the destination is not specified, the PMU will count events to all destinations.
> +
> +Example usage:
> +
> +* Count event id 0x0 in socket 0 from all sources and to all destinations::
> +
> +    perf stat -a -e nvidia_ucf_pmu_0/event=0x0/
> +
> +* Count event id 0x0 in socket 0 with source filter = local CPU and destination
> +  filter = local system memory (CMEM)::
> +
> +    perf stat -a -e nvidia_ucf_pmu_0/event=0x0,src_loc_cpu=0x1,dst_loc_cmem=0x1/
> +
> +* Count event id 0x0 in socket 1 with source filter = local non-CPU device and
> +  destination filter = remote memory::
> +
> +    perf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/
> diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> index e06a06d3407b..c67667097a3c 100644
> --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
> +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> @@ -1,6 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
> /*
> - * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>  *
>  */
>
> @@ -21,6 +21,13 @@
> #define NV_CNVL_PORT_COUNT           4ULL
> #define NV_CNVL_FILTER_ID_MASK       GENMASK_ULL(NV_CNVL_PORT_COUNT - 1, 0)
>
> +#define NV_UCF_SRC_COUNT             3ULL
> +#define NV_UCF_DST_COUNT             4ULL
> +#define NV_UCF_FILTER_ID_MASK        GENMASK_ULL(11, 0)
> +#define NV_UCF_FILTER_SRC            GENMASK_ULL(2, 0)
> +#define NV_UCF_FILTER_DST            GENMASK_ULL(11, 8)
> +#define NV_UCF_FILTER_DEFAULT        (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DST)
> +
> #define NV_GENERIC_FILTER_ID_MASK    GENMASK_ULL(31, 0)
>
> #define NV_PRODID_MASK	(PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISION)
> @@ -124,6 +131,37 @@ static struct attribute *mcf_pmu_event_attrs[] = {
> 	NULL,
> };
>
> +static struct attribute *ucf_pmu_event_attrs[] = {
> +	ARM_CSPMU_EVENT_ATTR(bus_cycles,            0x1D),
> +
> +	ARM_CSPMU_EVENT_ATTR(slc_allocate,          0xF0),
> +	ARM_CSPMU_EVENT_ATTR(slc_wb,                0xF3),
> +	ARM_CSPMU_EVENT_ATTR(slc_refill_rd,         0x109),
> +	ARM_CSPMU_EVENT_ATTR(slc_refill_wr,         0x10A),
> +	ARM_CSPMU_EVENT_ATTR(slc_hit_rd,            0x119),
> +
> +	ARM_CSPMU_EVENT_ATTR(slc_access_dataless,   0x183),
> +	ARM_CSPMU_EVENT_ATTR(slc_access_atomic,     0x184),
> +
> +	ARM_CSPMU_EVENT_ATTR(slc_access,            0xF2),
> +	ARM_CSPMU_EVENT_ATTR(slc_access_rd,         0x111),
> +	ARM_CSPMU_EVENT_ATTR(slc_access_wr,         0x112),
> +	ARM_CSPMU_EVENT_ATTR(slc_bytes_rd,          0x113),
> +	ARM_CSPMU_EVENT_ATTR(slc_bytes_wr,          0x114),
> +
> +	ARM_CSPMU_EVENT_ATTR(mem_access_rd,         0x121),
> +	ARM_CSPMU_EVENT_ATTR(mem_access_wr,         0x122),
> +	ARM_CSPMU_EVENT_ATTR(mem_bytes_rd,          0x123),
> +	ARM_CSPMU_EVENT_ATTR(mem_bytes_wr,          0x124),
> +
> +	ARM_CSPMU_EVENT_ATTR(local_snoop,           0x180),
> +	ARM_CSPMU_EVENT_ATTR(ext_snp_access,        0x181),
> +	ARM_CSPMU_EVENT_ATTR(ext_snp_evict,         0x182),
> +
> +	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
> +	NULL,
> +};
> +
> static struct attribute *generic_pmu_event_attrs[] = {
> 	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
> 	NULL,
> @@ -152,6 +190,18 @@ static struct attribute *cnvlink_pmu_format_attrs[] = {
> 	NULL,
> };
>
> +static struct attribute *ucf_pmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	ARM_CSPMU_FORMAT_ATTR(src_loc_noncpu, "config1:0"),
> +	ARM_CSPMU_FORMAT_ATTR(src_loc_cpu, "config1:1"),
> +	ARM_CSPMU_FORMAT_ATTR(src_rem, "config1:2"),
> +	ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config1:8"),
> +	ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config1:9"),
> +	ARM_CSPMU_FORMAT_ATTR(dst_loc_other, "config1:10"),
> +	ARM_CSPMU_FORMAT_ATTR(dst_rem, "config1:11"),
> +	NULL,
> +};
> +
> static struct attribute *generic_pmu_format_attrs[] = {
> 	ARM_CSPMU_FORMAT_EVENT_ATTR,
> 	ARM_CSPMU_FORMAT_FILTER_ATTR,
> @@ -236,6 +286,27 @@ static void nv_cspmu_set_cc_filter(struct arm_cspmu *cspmu,
> 	writel(filter, cspmu->base0 + PMCCFILTR);
> }
>
> +static u32 ucf_pmu_event_filter(const struct perf_event *event)
> +{
> +	u32 ret, filter, src, dst;
> +
> +	filter = nv_cspmu_event_filter(event);
> +
> +	/* Monitor all sources if none is selected. */
> +	src = FIELD_GET(NV_UCF_FILTER_SRC, filter);
> +	if (src == 0)
> +		src = GENMASK_ULL(NV_UCF_SRC_COUNT - 1, 0);
> +
> +	/* Monitor all destinations if none is selected. */
> +	dst = FIELD_GET(NV_UCF_FILTER_DST, filter);
> +	if (dst == 0)
> +		dst = GENMASK_ULL(NV_UCF_DST_COUNT - 1, 0);
> +
> +	ret = FIELD_PREP(NV_UCF_FILTER_SRC, src);
> +	ret |= FIELD_PREP(NV_UCF_FILTER_DST, dst);
> +
> +	return ret;
> +}
>
> enum nv_cspmu_name_fmt {
> 	NAME_FMT_GENERIC,
> @@ -342,6 +413,23 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
> 		.init_data = NULL
> 	  },
> 	},
> +	{
> +	  .prodid = 0x2CF20000,
> +	  .prodid_mask = NV_PRODID_MASK,
> +	  .name_pattern = "nvidia_ucf_pmu_%u",
> +	  .name_fmt = NAME_FMT_SOCKET,
> +	  .template_ctx = {
> +		.event_attr = ucf_pmu_event_attrs,
> +		.format_attr = ucf_pmu_format_attrs,
> +		.filter_mask = NV_UCF_FILTER_ID_MASK,
> +		.filter_default_val = NV_UCF_FILTER_DEFAULT,
> +		.filter2_mask = 0x0,
> +		.filter2_default_val = 0x0,
> +		.get_filter = ucf_pmu_event_filter,
> +		.get_filter2 = NULL,
> +		.init_data = NULL
> +	  },
> +	},
> 	{
> 	  .prodid = 0,
> 	  .prodid_mask = 0,
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
  2026-01-26 18:11 ` [PATCH 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get Besar Wicaksono
@ 2026-01-29 22:23   ` Ilkka Koskinen
  0 siblings, 0 replies; 26+ messages in thread
From: Ilkka Koskinen @ 2026-01-29 22:23 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: will, suzuki.poulose, robin.murphy, ilkka, linux-arm-kernel,
	linux-kernel, linux-tegra, mark.rutland, treding, jonathanh,
	vsethi, rwiley, sdonthineni, skelley, ywan, mochs, nirmoyd


Hi Besar,

On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> Add interface to get ACPI device associated with the
> PMU. This ACPI device may contain additional properties
> not covered by the standard properties.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
> drivers/perf/arm_cspmu/arm_cspmu.c | 24 +++++++++++++++++++++++-
> drivers/perf/arm_cspmu/arm_cspmu.h | 17 ++++++++++++++++-
> 2 files changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
> index 34430b68f602..dadc9b765d80 100644
> --- a/drivers/perf/arm_cspmu/arm_cspmu.c
> +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> @@ -16,7 +16,7 @@
>  * The user should refer to the vendor technical documentation to get details
>  * about the supported events.
>  *
> - * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>  *
>  */
>
> @@ -1132,6 +1132,28 @@ static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)
>
> 	return 0;
> }
> +
> +struct acpi_device *arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu)
> +{
> +	char hid[16];
> +	char uid[16];
> +	struct acpi_device *adev;
> +	const struct acpi_apmt_node *apmt_node;
> +
> +	apmt_node = arm_cspmu_apmt_node(cspmu->dev);
> +	if (!apmt_node || apmt_node->type != ACPI_APMT_NODE_TYPE_ACPI)
> +		return NULL;
> +
> +	memset(hid, 0, sizeof(hid));
> +	memset(uid, 0, sizeof(uid));
> +
> +	memcpy(hid, &apmt_node->inst_primary, sizeof(apmt_node->inst_primary));
> +	snprintf(uid, sizeof(uid), "%u", apmt_node->inst_secondary);
> +
> +	adev = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	return adev;

I'd probably just do

 	return acpi_dev_get_first_match_dev(hid, uid, -1);

but it doesn't really matter. Regardless,

 	Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Cheers, Ilkka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
  2026-01-26 18:11 ` [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Besar Wicaksono
  2026-01-28 15:56   ` kernel test robot
@ 2026-01-29 22:34   ` Ilkka Koskinen
  1 sibling, 0 replies; 26+ messages in thread
From: Ilkka Koskinen @ 2026-01-29 22:34 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: will, suzuki.poulose, robin.murphy, ilkka, linux-arm-kernel,
	linux-kernel, linux-tegra, mark.rutland, treding, jonathanh,
	vsethi, rwiley, sdonthineni, skelley, ywan, mochs, nirmoyd


Hi Besar,

On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> Adds PCIE PMU support in Tegra410 SOC.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
> .../admin-guide/perf/nvidia-tegra410-pmu.rst  | 162 ++++++++++++++
> drivers/perf/arm_cspmu/nvidia_cspmu.c         | 208 +++++++++++++++++-
> 2 files changed, 368 insertions(+), 2 deletions(-)
>

<snip>

> diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> index c67667097a3c..3a5531d1f94c 100644
> --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
> +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c

> @@ -453,7 +645,7 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
> static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu,
> 				  const struct nv_cspmu_match *match)
> {
> -	char *name;
> +	char *name = NULL;

You can remove the assignment in the default branch below now.

Otherwise, the patch looks good to me

 	Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Cheers, Ilkka


> 	struct device *dev = cspmu->dev;
>
> 	static atomic_t pmu_generic_idx = {0};
> @@ -467,6 +659,16 @@ static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu,
> 				       socket);
> 		break;
> 	}
> +	case NAME_FMT_SOCKET_INST: {
> +		const int cpu = cpumask_first(&cspmu->associated_cpus);
> +		const int socket = cpu_to_node(cpu);
> +		u32 inst_id;
> +
> +		if (!nv_cspmu_get_inst_id(cspmu, &inst_id))
> +			name = devm_kasprintf(dev, GFP_KERNEL,
> +					match->name_pattern, socket, inst_id);
> +		break;
> +	}
> 	case NAME_FMT_GENERIC:
> 		name = devm_kasprintf(dev, GFP_KERNEL, match->name_pattern,
> 				       atomic_fetch_inc(&pmu_generic_idx));
 		break;
 	default:
                 name = NULL;
                 ^^^^^^^^^^^^

                 break;
         }



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
  2026-01-26 18:11 ` [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU Besar Wicaksono
@ 2026-01-29 22:40   ` Ilkka Koskinen
  2026-02-10 16:01     ` Besar Wicaksono
  0 siblings, 1 reply; 26+ messages in thread
From: Ilkka Koskinen @ 2026-01-29 22:40 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: will, suzuki.poulose, robin.murphy, ilkka, linux-arm-kernel,
	linux-kernel, linux-tegra, mark.rutland, treding, jonathanh,
	vsethi, rwiley, sdonthineni, skelley, ywan, mochs, nirmoyd


Hi Besar,

On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> Adds PCIE-TGT PMU support in Tegra410 SOC.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
> .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  76 ++++
> drivers/perf/arm_cspmu/nvidia_cspmu.c         | 324 ++++++++++++++++++
> 2 files changed, 400 insertions(+)
>
> diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> index 3a5531d1f94c..095d2f322c6f 100644
> --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
> +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c

<snip>

> +static void pcie_tgt_pmu_reset_ev_filter(struct arm_cspmu *cspmu,
> +				     const struct perf_event *event)
> +{
> +	bool addr_filter_en;
> +	u64 base, mask;
> +	int idx;
> +
> +	addr_filter_en = pcie_tgt_pmu_addr_en(event);
> +	if (!addr_filter_en)
> +		return;
> +
> +	base = pcie_tgt_pmu_dst_addr_base(event);
> +	mask = pcie_tgt_pmu_dst_addr_mask(event);
> +	idx = pcie_tgt_find_addr_idx(cspmu, base, mask, true);
> +
> +	if (idx < 0) {
> +		dev_err(cspmu->dev,
> +			"Unable to find the address filter slot to reset\n");
> +		return;
> +	}
> +
> +	pcie_tgt_pmu_config_addr_filter(
> +			cspmu, false, base, mask, idx);

I think you can fit the arguments in the same line with the function name

Otherwise, looks good to me.

 	Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Cheers, Ilkka




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU
  2026-01-26 18:11 ` [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU Besar Wicaksono
  2026-01-28  0:40   ` kernel test robot
@ 2026-01-30  2:09   ` Ilkka Koskinen
  1 sibling, 0 replies; 26+ messages in thread
From: Ilkka Koskinen @ 2026-01-30  2:09 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: will, suzuki.poulose, robin.murphy, ilkka, linux-arm-kernel,
	linux-kernel, linux-tegra, mark.rutland, treding, jonathanh,
	vsethi, rwiley, sdonthineni, skelley, ywan, mochs, nirmoyd


Hi Besar,

On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> Adds CPU Memory (CMEM) Latency  PMU support in Tegra410 SOC.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>

Looks good to me

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Cheers, Ilkka

> ---
> .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  25 +
> drivers/perf/Kconfig                          |   7 +
> drivers/perf/Makefile                         |   1 +
> drivers/perf/nvidia_t410_cmem_latency_pmu.c   | 727 ++++++++++++++++++
> 4 files changed, 760 insertions(+)
> create mode 100644 drivers/perf/nvidia_t410_cmem_latency_pmu.c
>
> diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> index 07dc447eead7..11fc1c88346a 100644
> --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> @@ -8,6 +8,7 @@ metrics like memory bandwidth, latency, and utilization:
> * Unified Coherence Fabric (UCF)
> * PCIE
> * PCIE-TGT
> +* CPU Memory (CMEM) Latency
>
> PMU Driver
> ----------
> @@ -342,3 +343,27 @@ Example usage:
>   0x10000 to 0x100FF on socket 0's PCIE RC-1::
>
>     perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/
> +
> +CPU Memory (CMEM) Latency PMU
> +-----------------------------
> +
> +This PMU monitors latency events of memory read requests to local
> +CPU DRAM:
> +
> +  * RD_REQ counters: count read requests (32B per request).
> +  * RD_CUM_OUTS counters: accumulated outstanding request counter, which track
> +    how many cycles the read requests are in flight.
> +  * CYCLES counter: counts the number of elapsed cycles.
> +
> +The average latency is calculated as::
> +
> +   FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
> +   AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ
> +   AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_<socket-id>.
> +
> +Example usage::
> +
> +  perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 638321fc9800..9fed3c41d5ea 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -311,4 +311,11 @@ config MARVELL_PEM_PMU
> 	  Enable support for PCIe Interface performance monitoring
> 	  on Marvell platform.
>
> +config NVIDIA_TEGRA410_CMEM_LATENCY_PMU
> +	tristate "NVIDIA Tegra410 CPU Memory Latency PMU"
> +	depends on ARM64
> +	help
> +	  Enable perf support for CPU memory latency counters monitoring on
> +	  NVIDIA Tegra410 SoC.
> +
> endmenu
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index ea52711a87e3..4aa6aad393c2 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -35,3 +35,4 @@ obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o
> obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
> obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
> obj-$(CONFIG_CXL_PMU) += cxl_pmu.o
> +obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) += nvidia_t410_cmem_latency_pmu.o
> diff --git a/drivers/perf/nvidia_t410_cmem_latency_pmu.c b/drivers/perf/nvidia_t410_cmem_latency_pmu.c
> new file mode 100644
> index 000000000000..9b466581c8fc
> --- /dev/null
> +++ b/drivers/perf/nvidia_t410_cmem_latency_pmu.c
> @@ -0,0 +1,727 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVIDIA Tegra410 CPU Memory (CMEM) Latency PMU driver.
> + *
> + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/module.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +
> +#define NUM_INSTANCES    14
> +#define BCAST(pmu) pmu->base[NUM_INSTANCES]
> +
> +/* Register offsets. */
> +#define CG_CTRL         0x800
> +#define CTRL            0x808
> +#define STATUS          0x810
> +#define CYCLE_CNTR      0x818
> +#define MC0_REQ_CNTR    0x820
> +#define MC0_AOR_CNTR    0x830
> +#define MC1_REQ_CNTR    0x838
> +#define MC1_AOR_CNTR    0x848
> +#define MC2_REQ_CNTR    0x850
> +#define MC2_AOR_CNTR    0x860
> +
> +/* CTRL values. */
> +#define CTRL_DISABLE    0x0ULL
> +#define CTRL_ENABLE     0x1ULL
> +#define CTRL_CLR        0x2ULL
> +
> +/* CG_CTRL values. */
> +#define CG_CTRL_DISABLE    0x0ULL
> +#define CG_CTRL_ENABLE     0x1ULL
> +
> +/* STATUS register field. */
> +#define STATUS_CYCLE_OVF      BIT(0)
> +#define STATUS_MC0_AOR_OVF    BIT(1)
> +#define STATUS_MC0_REQ_OVF    BIT(3)
> +#define STATUS_MC1_AOR_OVF    BIT(4)
> +#define STATUS_MC1_REQ_OVF    BIT(6)
> +#define STATUS_MC2_AOR_OVF    BIT(7)
> +#define STATUS_MC2_REQ_OVF    BIT(9)
> +
> +/* Events. */
> +#define EVENT_CYCLES    0x0
> +#define EVENT_REQ       0x1
> +#define EVENT_AOR       0x2
> +
> +#define NUM_EVENTS           0x3
> +#define MASK_EVENT           0x3
> +#define MAX_ACTIVE_EVENTS    32
> +
> +#define ACTIVE_CPU_MASK        0x0
> +#define ASSOCIATED_CPU_MASK    0x1
> +
> +static unsigned long cmem_lat_pmu_cpuhp_state;
> +
> +struct cmem_lat_pmu_hw_events {
> +	struct perf_event *events[MAX_ACTIVE_EVENTS];
> +	DECLARE_BITMAP(used_ctrs, MAX_ACTIVE_EVENTS);
> +};
> +
> +struct cmem_lat_pmu {
> +	struct pmu pmu;
> +	struct device *dev;
> +	const char *name;
> +	const char *identifier;
> +	void __iomem *base[NUM_INSTANCES + 1];
> +	cpumask_t associated_cpus;
> +	cpumask_t active_cpu;
> +	struct hlist_node node;
> +	struct cmem_lat_pmu_hw_events hw_events;
> +};
> +
> +#define to_cmem_lat_pmu(p) \
> +	container_of(p, struct cmem_lat_pmu, pmu)
> +
> +
> +/* Get event type from perf_event. */
> +static inline u32 get_event_type(struct perf_event *event)
> +{
> +	return (event->attr.config) & MASK_EVENT;
> +}
> +
> +/* PMU operations. */
> +static int cmem_lat_pmu_get_event_idx(struct cmem_lat_pmu_hw_events *hw_events,
> +				struct perf_event *event)
> +{
> +	unsigned int idx;
> +
> +	idx = find_first_zero_bit(hw_events->used_ctrs, MAX_ACTIVE_EVENTS);
> +	if (idx >= MAX_ACTIVE_EVENTS)
> +		return -EAGAIN;
> +
> +	set_bit(idx, hw_events->used_ctrs);
> +
> +	return idx;
> +}
> +
> +static bool cmem_lat_pmu_validate_event(struct pmu *pmu,
> +				 struct cmem_lat_pmu_hw_events *hw_events,
> +				 struct perf_event *event)
> +{
> +	if (is_software_event(event))
> +		return true;
> +
> +	/* Reject groups spanning multiple HW PMUs. */
> +	if (event->pmu != pmu)
> +		return false;
> +
> +	return (cmem_lat_pmu_get_event_idx(hw_events, event) >= 0);
> +}
> +
> +/*
> + * Make sure the group of events can be scheduled at once
> + * on the PMU.
> + */
> +static bool cmem_lat_pmu_validate_group(struct perf_event *event)
> +{
> +	struct perf_event *sibling, *leader = event->group_leader;
> +	struct cmem_lat_pmu_hw_events fake_hw_events;
> +
> +	if (event->group_leader == event)
> +		return true;
> +
> +	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
> +
> +	if (!cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, leader))
> +		return false;
> +
> +	for_each_sibling_event(sibling, leader) {
> +		if (!cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events,
> +						sibling))
> +			return false;
> +	}
> +
> +	return cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, event);
> +}
> +
> +static int cmem_lat_pmu_event_init(struct perf_event *event)
> +{
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u32 event_type = get_event_type(event);
> +
> +	if (event->attr.type != event->pmu->type ||
> +	    event_type >= NUM_EVENTS)
> +		return -ENOENT;
> +
> +	/*
> +	 * Following other "uncore" PMUs, we do not support sampling mode or
> +	 * attach to a task (per-process mode).
> +	 */
> +	if (is_sampling_event(event)) {
> +		dev_dbg(cmem_lat_pmu->pmu.dev,
> +			"Can't support sampling events\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
> +		dev_dbg(cmem_lat_pmu->pmu.dev,
> +			"Can't support per-task counters\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Make sure the CPU assignment is on one of the CPUs associated with
> +	 * this PMU.
> +	 */
> +	if (!cpumask_test_cpu(event->cpu, &cmem_lat_pmu->associated_cpus)) {
> +		dev_dbg(cmem_lat_pmu->pmu.dev,
> +			"Requested cpu is not associated with the PMU\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Enforce the current active CPU to handle the events in this PMU. */
> +	event->cpu = cpumask_first(&cmem_lat_pmu->active_cpu);
> +	if (event->cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	if (!cmem_lat_pmu_validate_group(event))
> +		return -EINVAL;
> +
> +	hwc->idx = -1;
> +	hwc->config = event_type;
> +
> +	return 0;
> +}
> +
> +static u64 cmem_lat_pmu_read_status(struct cmem_lat_pmu *cmem_lat_pmu,
> +				   unsigned int inst)
> +{
> +	return readq(cmem_lat_pmu->base[inst] + STATUS);
> +}
> +
> +static u64 cmem_lat_pmu_read_cycle_counter(struct perf_event *event)
> +{
> +	const unsigned int instance = 0;
> +	u64 status;
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
> +	struct device *dev = cmem_lat_pmu->dev;
> +
> +	/*
> +	 * Use the reading from first instance since all instances are
> +	 * identical.
> +	 */
> +	status = cmem_lat_pmu_read_status(cmem_lat_pmu, instance);
> +	if (status & STATUS_CYCLE_OVF)
> +		dev_warn(dev, "Cycle counter overflow\n");
> +
> +	return readq(cmem_lat_pmu->base[instance] + CYCLE_CNTR);
> +}
> +
> +static u64 cmem_lat_pmu_read_req_counter(struct perf_event *event)
> +{
> +	unsigned int i;
> +	u64 status, val = 0;
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
> +	struct device *dev = cmem_lat_pmu->dev;
> +
> +	/* Sum up the counts from all instances. */
> +	for (i = 0; i < NUM_INSTANCES; i++) {
> +		status = cmem_lat_pmu_read_status(cmem_lat_pmu, i);
> +		if (status & STATUS_MC0_REQ_OVF)
> +			dev_warn(dev, "MC0 request counter overflow\n");
> +		if (status & STATUS_MC1_REQ_OVF)
> +			dev_warn(dev, "MC1 request counter overflow\n");
> +		if (status & STATUS_MC2_REQ_OVF)
> +			dev_warn(dev, "MC2 request counter overflow\n");
> +
> +		val += readq(cmem_lat_pmu->base[i] + MC0_REQ_CNTR);
> +		val += readq(cmem_lat_pmu->base[i] + MC1_REQ_CNTR);
> +		val += readq(cmem_lat_pmu->base[i] + MC2_REQ_CNTR);
> +	}
> +
> +	return val;
> +}
> +
> +static u64 cmem_lat_pmu_read_aor_counter(struct perf_event *event)
> +{
> +	unsigned int i;
> +	u64 status, val = 0;
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
> +	struct device *dev = cmem_lat_pmu->dev;
> +
> +	/* Sum up the counts from all instances. */
> +	for (i = 0; i < NUM_INSTANCES; i++) {
> +		status = cmem_lat_pmu_read_status(cmem_lat_pmu, i);
> +		if (status & STATUS_MC0_AOR_OVF)
> +			dev_warn(dev, "MC0 AOR counter overflow\n");
> +		if (status & STATUS_MC1_AOR_OVF)
> +			dev_warn(dev, "MC1 AOR counter overflow\n");
> +		if (status & STATUS_MC2_AOR_OVF)
> +			dev_warn(dev, "MC2 AOR counter overflow\n");
> +
> +		val += readq(cmem_lat_pmu->base[i] + MC0_AOR_CNTR);
> +		val += readq(cmem_lat_pmu->base[i] + MC1_AOR_CNTR);
> +		val += readq(cmem_lat_pmu->base[i] + MC2_AOR_CNTR);
> +	}
> +
> +	return val;
> +}
> +
> +static u64 (*read_counter_fn[NUM_EVENTS])(struct perf_event *) = {
> +	[EVENT_CYCLES] = cmem_lat_pmu_read_cycle_counter,
> +	[EVENT_REQ] = cmem_lat_pmu_read_req_counter,
> +	[EVENT_AOR] = cmem_lat_pmu_read_aor_counter,
> +};
> +
> +static void cmem_lat_pmu_event_update(struct perf_event *event)
> +{
> +	u32 event_type;
> +	u64 prev, now;
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (hwc->state & PERF_HES_STOPPED)
> +		return;
> +
> +	event_type = hwc->config;
> +
> +	do {
> +		prev = local64_read(&hwc->prev_count);
> +		now = read_counter_fn[event_type](event);
> +	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> +
> +	local64_add(now - prev, &event->count);
> +
> +	hwc->state |= PERF_HES_UPTODATE;
> +}
> +
> +static void cmem_lat_pmu_start(struct perf_event *event, int pmu_flags)
> +{
> +	event->hw.state = 0;
> +}
> +
> +static void cmem_lat_pmu_stop(struct perf_event *event, int pmu_flags)
> +{
> +	event->hw.state |= PERF_HES_STOPPED;
> +}
> +
> +static int cmem_lat_pmu_add(struct perf_event *event, int flags)
> +{
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
> +	struct cmem_lat_pmu_hw_events *hw_events = &cmem_lat_pmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx;
> +
> +	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &cmem_lat_pmu->associated_cpus)))
> +		return -ENOENT;
> +
> +	idx = cmem_lat_pmu_get_event_idx(hw_events, event);
> +	if (idx < 0)
> +		return idx;
> +
> +	hw_events->events[idx] = event;
> +	hwc->idx = idx;
> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +
> +	if (flags & PERF_EF_START)
> +		cmem_lat_pmu_start(event, PERF_EF_RELOAD);
> +
> +	/* Propagate changes to the userspace mapping. */
> +	perf_event_update_userpage(event);
> +
> +	return 0;
> +}
> +
> +static void cmem_lat_pmu_del(struct perf_event *event, int flags)
> +{
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(event->pmu);
> +	struct cmem_lat_pmu_hw_events *hw_events = &cmem_lat_pmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	cmem_lat_pmu_stop(event, PERF_EF_UPDATE);
> +
> +	hw_events->events[idx] = NULL;
> +
> +	clear_bit(idx, hw_events->used_ctrs);
> +
> +	perf_event_update_userpage(event);
> +}
> +
> +static void cmem_lat_pmu_read(struct perf_event *event)
> +{
> +	cmem_lat_pmu_event_update(event);
> +}
> +
> +static inline void cmem_lat_pmu_cg_ctrl(struct cmem_lat_pmu *cmem_lat_pmu, u64 val)
> +{
> +	writeq(val, BCAST(cmem_lat_pmu) + CG_CTRL);
> +}
> +
> +static inline void cmem_lat_pmu_ctrl(struct cmem_lat_pmu *cmem_lat_pmu, u64 val)
> +{
> +	writeq(val, BCAST(cmem_lat_pmu) + CTRL);
> +}
> +
> +static void cmem_lat_pmu_enable(struct pmu *pmu)
> +{
> +	bool disabled;
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(pmu);
> +
> +	disabled = bitmap_empty(
> +		cmem_lat_pmu->hw_events.used_ctrs, MAX_ACTIVE_EVENTS);
> +
> +	if (disabled)
> +		return;
> +
> +	/* Enable all the counters. */
> +	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_ENABLE);
> +	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_ENABLE);
> +}
> +
> +static void cmem_lat_pmu_disable(struct pmu *pmu)
> +{
> +	int idx;
> +	struct perf_event *event;
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(pmu);
> +
> +	/* Disable all the counters. */
> +	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_DISABLE);
> +
> +	/*
> +	 * The counters will start from 0 again on restart.
> +	 * Update the events immediately to avoid losing the counts.
> +	 */
> +	for_each_set_bit(
> +		idx, cmem_lat_pmu->hw_events.used_ctrs, MAX_ACTIVE_EVENTS) {
> +		event = cmem_lat_pmu->hw_events.events[idx];
> +
> +		if (!event)
> +			continue;
> +
> +		cmem_lat_pmu_event_update(event);
> +
> +		local64_set(&event->hw.prev_count, 0ULL);
> +	}
> +
> +	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_CLR);
> +	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_DISABLE);
> +}
> +
> +/* PMU identifier attribute. */
> +
> +static ssize_t cmem_lat_pmu_identifier_show(struct device *dev,
> +					 struct device_attribute *attr,
> +					 char *page)
> +{
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(dev_get_drvdata(dev));
> +
> +	return sysfs_emit(page, "%s\n", cmem_lat_pmu->identifier);
> +}
> +
> +static struct device_attribute cmem_lat_pmu_identifier_attr =
> +	__ATTR(identifier, 0444, cmem_lat_pmu_identifier_show, NULL);
> +
> +static struct attribute *cmem_lat_pmu_identifier_attrs[] = {
> +	&cmem_lat_pmu_identifier_attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group cmem_lat_pmu_identifier_attr_group = {
> +	.attrs = cmem_lat_pmu_identifier_attrs,
> +};
> +
> +/* Format attributes. */
> +
> +#define NV_PMU_EXT_ATTR(_name, _func, _config)			\
> +	(&((struct dev_ext_attribute[]){				\
> +		{							\
> +			.attr = __ATTR(_name, 0444, _func, NULL),	\
> +			.var = (void *)_config				\
> +		}							\
> +	})[0].attr.attr)
> +
> +static struct attribute *cmem_lat_pmu_formats[] = {
> +	NV_PMU_EXT_ATTR(event, device_show_string, "config:0-1"),
> +	NULL,
> +};
> +
> +static const struct attribute_group cmem_lat_pmu_format_group = {
> +	.name = "format",
> +	.attrs = cmem_lat_pmu_formats,
> +};
> +
> +/* Event attributes. */
> +
> +static ssize_t cmem_lat_pmu_sysfs_event_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct perf_pmu_events_attr *pmu_attr;
> +
> +	pmu_attr = container_of(attr, typeof(*pmu_attr), attr);
> +	return sysfs_emit(buf, "event=0x%llx\n", pmu_attr->id);
> +}
> +
> +#define NV_PMU_EVENT_ATTR(_name, _config)	\
> +	PMU_EVENT_ATTR_ID(_name, cmem_lat_pmu_sysfs_event_show, _config)
> +
> +static struct attribute *cmem_lat_pmu_events[] = {
> +	NV_PMU_EVENT_ATTR(cycles, EVENT_CYCLES),
> +	NV_PMU_EVENT_ATTR(rd_req, EVENT_REQ),
> +	NV_PMU_EVENT_ATTR(rd_cum_outs, EVENT_AOR),
> +	NULL
> +};
> +
> +static const struct attribute_group cmem_lat_pmu_events_group = {
> +	.name = "events",
> +	.attrs = cmem_lat_pmu_events,
> +};
> +
> +/* Cpumask attributes. */
> +
> +static ssize_t cmem_lat_pmu_cpumask_show(struct device *dev,
> +			    struct device_attribute *attr, char *buf)
> +{
> +	struct pmu *pmu = dev_get_drvdata(dev);
> +	struct cmem_lat_pmu *cmem_lat_pmu = to_cmem_lat_pmu(pmu);
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	unsigned long mask_id = (unsigned long)eattr->var;
> +	const cpumask_t *cpumask;
> +
> +	switch (mask_id) {
> +	case ACTIVE_CPU_MASK:
> +		cpumask = &cmem_lat_pmu->active_cpu;
> +		break;
> +	case ASSOCIATED_CPU_MASK:
> +		cpumask = &cmem_lat_pmu->associated_cpus;
> +		break;
> +	default:
> +		return 0;
> +	}
> +	return cpumap_print_to_pagebuf(true, buf, cpumask);
> +}
> +
> +#define NV_PMU_CPUMASK_ATTR(_name, _config)			\
> +	NV_PMU_EXT_ATTR(_name, cmem_lat_pmu_cpumask_show,	\
> +				(unsigned long)_config)
> +
> +static struct attribute *cmem_lat_pmu_cpumask_attrs[] = {
> +	NV_PMU_CPUMASK_ATTR(cpumask, ACTIVE_CPU_MASK),
> +	NV_PMU_CPUMASK_ATTR(associated_cpus, ASSOCIATED_CPU_MASK),
> +	NULL,
> +};
> +
> +static const struct attribute_group cmem_lat_pmu_cpumask_attr_group = {
> +	.attrs = cmem_lat_pmu_cpumask_attrs,
> +};
> +
> +/* Per PMU device attribute groups. */
> +
> +static const struct attribute_group *cmem_lat_pmu_attr_groups[] = {
> +	&cmem_lat_pmu_identifier_attr_group,
> +	&cmem_lat_pmu_format_group,
> +	&cmem_lat_pmu_events_group,
> +	&cmem_lat_pmu_cpumask_attr_group,
> +	NULL,
> +};
> +
> +static int cmem_lat_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct cmem_lat_pmu *cmem_lat_pmu =
> +		hlist_entry_safe(node, struct cmem_lat_pmu, node);
> +
> +	if (!cpumask_test_cpu(cpu, &cmem_lat_pmu->associated_cpus))
> +		return 0;
> +
> +	/* If the PMU is already managed, there is nothing to do */
> +	if (!cpumask_empty(&cmem_lat_pmu->active_cpu))
> +		return 0;
> +
> +	/* Use this CPU for event counting */
> +	cpumask_set_cpu(cpu, &cmem_lat_pmu->active_cpu);
> +
> +	return 0;
> +}
> +
> +static int cmem_lat_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	unsigned int dst;
> +
> +	struct cmem_lat_pmu *cmem_lat_pmu =
> +		hlist_entry_safe(node, struct cmem_lat_pmu, node);
> +
> +	/* Nothing to do if this CPU doesn't own the PMU */
> +	if (!cpumask_test_and_clear_cpu(cpu, &cmem_lat_pmu->active_cpu))
> +		return 0;
> +
> +	/* Choose a new CPU to migrate ownership of the PMU to */
> +	dst = cpumask_any_and_but(&cmem_lat_pmu->associated_cpus,
> +				  cpu_online_mask, cpu);
> +	if (dst >= nr_cpu_ids)
> +		return 0;
> +
> +	/* Use this CPU for event counting */
> +	perf_pmu_migrate_context(&cmem_lat_pmu->pmu, cpu, dst);
> +	cpumask_set_cpu(dst, &cmem_lat_pmu->active_cpu);
> +
> +	return 0;
> +}
> +
> +static int cmem_lat_pmu_get_cpus(struct cmem_lat_pmu *cmem_lat_pmu,
> +				unsigned int socket)
> +{
> +	int ret = 0, cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (cpu_to_node(cpu) == socket)
> +			cpumask_set_cpu(cpu, &cmem_lat_pmu->associated_cpus);
> +	}
> +
> +	if (cpumask_empty(&cmem_lat_pmu->associated_cpus)) {
> +		dev_dbg(cmem_lat_pmu->dev,
> +			"No cpu associated with PMU socket-%u\n", socket);
> +		ret = -ENODEV;
> +	}
> +
> +	return ret;
> +}
> +
> +static int cmem_lat_pmu_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct acpi_device *acpi_dev;
> +	struct cmem_lat_pmu *cmem_lat_pmu;
> +	char *name, *uid_str;
> +	int ret, i;
> +	u32 socket;
> +
> +	acpi_dev = ACPI_COMPANION(dev);
> +	if (!acpi_dev)
> +		return -ENODEV;
> +
> +	uid_str = acpi_device_uid(acpi_dev);
> +	if (!uid_str)
> +		return -ENODEV;
> +
> +	ret = kstrtou32(uid_str, 0, &socket);
> +	if (ret)
> +		return ret;
> +
> +	cmem_lat_pmu = devm_kzalloc(dev, sizeof(*cmem_lat_pmu), GFP_KERNEL);
> +	name = devm_kasprintf(dev, GFP_KERNEL, "nvidia_cmem_latency_pmu_%u", socket);
> +	if (!cmem_lat_pmu || !name)
> +		return -ENOMEM;
> +
> +	cmem_lat_pmu->dev = dev;
> +	cmem_lat_pmu->name = name;
> +	cmem_lat_pmu->identifier = acpi_device_hid(acpi_dev);
> +	platform_set_drvdata(pdev, cmem_lat_pmu);
> +
> +	cmem_lat_pmu->pmu = (struct pmu) {
> +		.parent		= &pdev->dev,
> +		.task_ctx_nr	= perf_invalid_context,
> +		.pmu_enable	= cmem_lat_pmu_enable,
> +		.pmu_disable	= cmem_lat_pmu_disable,
> +		.event_init	= cmem_lat_pmu_event_init,
> +		.add		= cmem_lat_pmu_add,
> +		.del		= cmem_lat_pmu_del,
> +		.start		= cmem_lat_pmu_start,
> +		.stop		= cmem_lat_pmu_stop,
> +		.read		= cmem_lat_pmu_read,
> +		.attr_groups	= cmem_lat_pmu_attr_groups,
> +		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE |
> +					PERF_PMU_CAP_NO_INTERRUPT,
> +	};
> +
> +	/* Map the address of all the instances plus one for the broadcast. */
> +	for (i = 0; i < NUM_INSTANCES + 1; i++) {
> +		cmem_lat_pmu->base[i] = devm_platform_ioremap_resource(pdev, i);
> +		if (IS_ERR(cmem_lat_pmu->base[i])) {
> +			dev_err(dev, "Failed map address for instance %d\n", i);
> +			return PTR_ERR(cmem_lat_pmu->base[i]);
> +		}
> +	}
> +
> +	ret = cmem_lat_pmu_get_cpus(cmem_lat_pmu, socket);
> +	if (ret)
> +		return ret;
> +
> +	ret = cpuhp_state_add_instance(cmem_lat_pmu_cpuhp_state,
> +				       &cmem_lat_pmu->node);
> +	if (ret) {
> +		dev_err(&pdev->dev, "Error %d registering hotplug\n", ret);
> +		return ret;
> +	}
> +
> +	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_ENABLE);
> +	cmem_lat_pmu_ctrl(cmem_lat_pmu, CTRL_CLR);
> +	cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CG_CTRL_DISABLE);
> +
> +	ret = perf_pmu_register(&cmem_lat_pmu->pmu, name, -1);
> +	if (ret) {
> +		dev_err(&pdev->dev, "Failed to register PMU: %d\n", ret);
> +		cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state,
> +					    &cmem_lat_pmu->node);
> +		return ret;
> +	}
> +
> +	dev_dbg(&pdev->dev, "Registered %s PMU\n", name);
> +
> +	return 0;
> +}
> +
> +static void cmem_lat_pmu_device_remove(struct platform_device *pdev)
> +{
> +	struct cmem_lat_pmu *cmem_lat_pmu = platform_get_drvdata(pdev);
> +
> +	perf_pmu_unregister(&cmem_lat_pmu->pmu);
> +	cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state,
> +				    &cmem_lat_pmu->node);
> +}
> +
> +static const struct acpi_device_id cmem_lat_pmu_acpi_match[] = {
> +	{ "NVDA2021", },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(acpi, cmem_lat_pmu_acpi_match);
> +
> +static struct platform_driver cmem_lat_pmu_driver = {
> +	.driver = {
> +		.name = "nvidia-t410-cmem-latency-pmu",
> +		.acpi_match_table = ACPI_PTR(cmem_lat_pmu_acpi_match),
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe = cmem_lat_pmu_probe,
> +	.remove = cmem_lat_pmu_device_remove,
> +};
> +
> +static int __init cmem_lat_pmu_init(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> +				      "perf/nvidia/cmem_latency:online",
> +				      cmem_lat_pmu_cpu_online,
> +				      cmem_lat_pmu_cpu_teardown);
> +	if (ret < 0)
> +		return ret;
> +
> +	cmem_lat_pmu_cpuhp_state = ret;
> +
> +	return platform_driver_register(&cmem_lat_pmu_driver);
> +}
> +
> +static void __exit cmem_lat_pmu_exit(void)
> +{
> +	platform_driver_unregister(&cmem_lat_pmu_driver);
> +	cpuhp_remove_multi_state(cmem_lat_pmu_cpuhp_state);
> +}
> +
> +module_init(cmem_lat_pmu_init);
> +module_exit(cmem_lat_pmu_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("NVIDIA Tegra410 CPU Memory Latency PMU driver");
> +MODULE_AUTHOR("Besar Wicaksono <bwicaksono@nvidia.com>");
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU
  2026-01-26 18:11 ` [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU Besar Wicaksono
@ 2026-01-30  2:54   ` Ilkka Koskinen
  0 siblings, 0 replies; 26+ messages in thread
From: Ilkka Koskinen @ 2026-01-30  2:54 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: will, suzuki.poulose, robin.murphy, ilkka, linux-arm-kernel,
	linux-kernel, linux-tegra, mark.rutland, treding, jonathanh,
	vsethi, rwiley, sdonthineni, skelley, ywan, mochs, nirmoyd


Hi Besar,

On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> Adds NVIDIA C2C PMU support in Tegra410 SOC.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>

Looks good to me

 	Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Cheers, Ilkka


> ---
> .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  151 +++
> drivers/perf/Kconfig                          |    7 +
> drivers/perf/Makefile                         |    1 +
> drivers/perf/nvidia_t410_c2c_pmu.c            | 1061 +++++++++++++++++
> 4 files changed, 1220 insertions(+)
> create mode 100644 drivers/perf/nvidia_t410_c2c_pmu.c
>
> diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> index 11fc1c88346a..f81f356debe1 100644
> --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
> @@ -9,6 +9,9 @@ metrics like memory bandwidth, latency, and utilization:
> * PCIE
> * PCIE-TGT
> * CPU Memory (CMEM) Latency
> +* NVLink-C2C
> +* NV-CLink
> +* NV-DLink
>
> PMU Driver
> ----------
> @@ -367,3 +370,151 @@ see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_<socket-id>.
> Example usage::
>
>   perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'
> +
> +NVLink-C2C PMU
> +--------------
> +
> +This PMU monitors latency events of memory read/write requests that pass through
> +the NVIDIA Chip-to-Chip (C2C) interface. Bandwidth events are not available
> +in this PMU, unlike the C2C PMU in Grace (Tegra241 SoC).
> +
> +The events and configuration options of this PMU device are available in sysfs,
> +see /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_<socket-id>.
> +
> +The list of events:
> +
> +  * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.
> +  * IN_RD_REQ: the number of incoming read requests.
> +  * IN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incoming write requests.
> +  * IN_WR_REQ: the number of incoming write requests.
> +  * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.
> +  * OUT_RD_REQ: the number of outgoing read requests.
> +  * OUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing write requests.
> +  * OUT_WR_REQ: the number of outgoing write requests.
> +  * CYCLES: NVLink-C2C interface cycle counts.
> +
> +The incoming events count the reads/writes from remote device to the SoC.
> +The outgoing events count the reads/writes from the SoC to remote device.
> +
> +The sysfs /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_<socket-id>/peer
> +contains the information about the connected device.
> +
> +When the C2C interface is connected to GPU(s), the user can use the
> +"gpu_mask" parameter to filter traffic to/from specific GPU(s). Each bit represents the GPU
> +index, e.g. "gpu_mask=0x1" corresponds to GPU 0 and "gpu_mask=0x3" is for GPU 0 and 1.
> +The PMU will monitor all GPUs by default if not specified.
> +
> +When connected to another SoC, only the read events are available.
> +
> +The events can be used to calculate the average latency of the read/write requests::
> +
> +   C2C_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
> +
> +   IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ
> +   IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
> +
> +   IN_WR_AVG_LATENCY_IN_CYCLES = IN_WR_CUM_OUTS / IN_WR_REQ
> +   IN_WR_AVG_LATENCY_IN_NS = IN_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
> +
> +   OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ
> +   OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
> +
> +   OUT_WR_AVG_LATENCY_IN_CYCLES = OUT_WR_CUM_OUTS / OUT_WR_REQ
> +   OUT_WR_AVG_LATENCY_IN_NS = OUT_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ
> +
> +Example usage:
> +
> +  * Count incoming traffic from all GPUs connected via NVLink-C2C::
> +
> +      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/
> +
> +  * Count incoming traffic from GPU 0 connected via NVLink-C2C::
> +
> +      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x1/
> +
> +  * Count incoming traffic from GPU 1 connected via NVLink-C2C::
> +
> +      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x2/
> +
> +  * Count outgoing traffic to all GPUs connected via NVLink-C2C::
> +
> +      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_req/
> +
> +  * Count outgoing traffic to GPU 0 connected via NVLink-C2C::
> +
> +      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=0x1/
> +
> +  * Count outgoing traffic to GPU 1 connected via NVLink-C2C::
> +
> +      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=0x2/
> +
> +NV-CLink PMU
> +------------
> +
> +This PMU monitors latency events of memory read requests that pass through
> +the NV-CLINK interface. Bandwidth events are not available in this PMU.
> +In Tegra410 SoC, the NV-CLink interface is used to connect to another Tegra410
> +SoC and this PMU only counts read traffic.
> +
> +The events and configuration options of this PMU device are available in sysfs,
> +see /sys/bus/event_source/devices/nvidia_nvclink_pmu_<socket-id>.
> +
> +The list of events:
> +
> +  * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.
> +  * IN_RD_REQ: the number of incoming read requests.
> +  * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.
> +  * OUT_RD_REQ: the number of outgoing read requests.
> +  * CYCLES: NV-CLINK interface cycle counts.
> +
> +The incoming events count the reads from remote device to the SoC.
> +The outgoing events count the reads from the SoC to remote device.
> +
> +The events can be used to calculate the average latency of the read requests::
> +
> +   CLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
> +
> +   IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ
> +   IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ
> +
> +   OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ
> +   OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ
> +
> +Example usage:
> +
> +  * Count incoming read traffic from remote SoC connected via NV-CLINK::
> +
> +      perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/
> +
> +  * Count outgoing read traffic to remote SoC connected via NV-CLINK::
> +
> +      perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/
> +
> +NV-DLink PMU
> +------------
> +
> +This PMU monitors latency events of memory read requests that pass through
> +the NV-DLINK interface.  Bandwidth events are not available in this PMU.
> +In Tegra410 SoC, this PMU only counts CXL memory read traffic.
> +
> +The events and configuration options of this PMU device are available in sysfs,
> +see /sys/bus/event_source/devices/nvidia_nvdlink_pmu_<socket-id>.
> +
> +The list of events:
> +
> +  * IN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to CXL memory.
> +  * IN_RD_REQ: the number of read requests to CXL memory.
> +  * CYCLES: NV-DLINK interface cycle counts.
> +
> +The events can be used to calculate the average latency of the read requests::
> +
> +   DLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS
> +
> +   IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ
> +   IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / DLINK_FREQ_IN_GHZ
> +
> +Example usage:
> +
> +  * Count read events to CXL memory::
> +
> +      perf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu_0/in_rd_cum_outs/}'
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 9fed3c41d5ea..7ee36efe6bc0 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -318,4 +318,11 @@ config NVIDIA_TEGRA410_CMEM_LATENCY_PMU
> 	  Enable perf support for CPU memory latency counters monitoring on
> 	  NVIDIA Tegra410 SoC.
>
> +config NVIDIA_TEGRA410_C2C_PMU
> +	tristate "NVIDIA Tegra410 C2C PMU"
> +	depends on ARM64 && ACPI
> +	help
> +	  Enable perf support for counters in NVIDIA C2C interface of NVIDIA
> +	  Tegra410 SoC.
> +
> endmenu
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 4aa6aad393c2..eb8a022dad9a 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -36,3 +36,4 @@ obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
> obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
> obj-$(CONFIG_CXL_PMU) += cxl_pmu.o
> obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) += nvidia_t410_cmem_latency_pmu.o
> +obj-$(CONFIG_NVIDIA_TEGRA410_C2C_PMU) += nvidia_t410_c2c_pmu.o
> diff --git a/drivers/perf/nvidia_t410_c2c_pmu.c b/drivers/perf/nvidia_t410_c2c_pmu.c
> new file mode 100644
> index 000000000000..362e0e5f8b24
> --- /dev/null
> +++ b/drivers/perf/nvidia_t410_c2c_pmu.c
> @@ -0,0 +1,1061 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVIDIA Tegra410 C2C PMU driver.
> + *
> + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/module.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <linux/property.h>
> +
> +/* The C2C interface types in Tegra410. */
> +#define C2C_TYPE_NVLINK          0x0
> +#define C2C_TYPE_NVCLINK         0x1
> +#define C2C_TYPE_NVDLINK         0x2
> +#define C2C_TYPE_COUNT           0x3
> +
> +/* The type of the peer device connected to the C2C interface. */
> +#define C2C_PEER_TYPE_CPU        0x0
> +#define C2C_PEER_TYPE_GPU        0x1
> +#define C2C_PEER_TYPE_CXLMEM     0x2
> +#define C2C_PEER_TYPE_COUNT      0x3
> +
> +/* The number of peer devices can be connected to the C2C interface. */
> +#define C2C_NR_PEER_CPU          0x1
> +#define C2C_NR_PEER_GPU          0x2
> +#define C2C_NR_PEER_CXLMEM       0x1
> +#define C2C_NR_PEER_MAX          0x2
> +
> +/* Number of instances on each interface. */
> +#define C2C_NR_INST_NVLINK       14
> +#define C2C_NR_INST_NVCLINK      12
> +#define C2C_NR_INST_NVDLINK      16
> +#define C2C_NR_INST_MAX          16
> +
> +/* Register offsets. */
> +#define C2C_CTRL                    0x864
> +#define C2C_IN_STATUS               0x868
> +#define C2C_CYCLE_CNTR              0x86c
> +#define C2C_IN_RD_CUM_OUTS_CNTR     0x874
> +#define C2C_IN_RD_REQ_CNTR          0x87c
> +#define C2C_IN_WR_CUM_OUTS_CNTR     0x884
> +#define C2C_IN_WR_REQ_CNTR          0x88c
> +#define C2C_OUT_STATUS              0x890
> +#define C2C_OUT_RD_CUM_OUTS_CNTR    0x898
> +#define C2C_OUT_RD_REQ_CNTR         0x8a0
> +#define C2C_OUT_WR_CUM_OUTS_CNTR    0x8a8
> +#define C2C_OUT_WR_REQ_CNTR         0x8b0
> +
> +/* C2C_IN_STATUS register field. */
> +#define C2C_IN_STATUS_CYCLE_OVF             BIT(0)
> +#define C2C_IN_STATUS_IN_RD_CUM_OUTS_OVF    BIT(1)
> +#define C2C_IN_STATUS_IN_RD_REQ_OVF         BIT(2)
> +#define C2C_IN_STATUS_IN_WR_CUM_OUTS_OVF    BIT(3)
> +#define C2C_IN_STATUS_IN_WR_REQ_OVF         BIT(4)
> +
> +/* C2C_OUT_STATUS register field. */
> +#define C2C_OUT_STATUS_OUT_RD_CUM_OUTS_OVF    BIT(0)
> +#define C2C_OUT_STATUS_OUT_RD_REQ_OVF         BIT(1)
> +#define C2C_OUT_STATUS_OUT_WR_CUM_OUTS_OVF    BIT(2)
> +#define C2C_OUT_STATUS_OUT_WR_REQ_OVF         BIT(3)
> +
> +/* Events. */
> +#define C2C_EVENT_CYCLES                0x0
> +#define C2C_EVENT_IN_RD_CUM_OUTS        0x1
> +#define C2C_EVENT_IN_RD_REQ             0x2
> +#define C2C_EVENT_IN_WR_CUM_OUTS        0x3
> +#define C2C_EVENT_IN_WR_REQ             0x4
> +#define C2C_EVENT_OUT_RD_CUM_OUTS       0x5
> +#define C2C_EVENT_OUT_RD_REQ            0x6
> +#define C2C_EVENT_OUT_WR_CUM_OUTS       0x7
> +#define C2C_EVENT_OUT_WR_REQ            0x8
> +
> +#define C2C_NUM_EVENTS           0x9
> +#define C2C_MASK_EVENT           0xFF
> +#define C2C_MAX_ACTIVE_EVENTS    32
> +
> +#define C2C_ACTIVE_CPU_MASK        0x0
> +#define C2C_ASSOCIATED_CPU_MASK    0x1
> +
> +/*
> + * Maximum poll count for reading counter value using high-low-high sequence.
> + */
> +#define HILOHI_MAX_POLL    1000
> +
> +static unsigned long nv_c2c_pmu_cpuhp_state;
> +
> +/* PMU descriptor. */
> +
> +/* Tracks the events assigned to the PMU for a given logical index. */
> +struct nv_c2c_pmu_hw_events {
> +	/* The events that are active. */
> +	struct perf_event *events[C2C_MAX_ACTIVE_EVENTS];
> +
> +	/*
> +	 * Each bit indicates a logical counter is being used (or not) for an
> +	 * event.
> +	 */
> +	DECLARE_BITMAP(used_ctrs, C2C_MAX_ACTIVE_EVENTS);
> +};
> +
> +struct nv_c2c_pmu {
> +	struct pmu pmu;
> +	struct device *dev;
> +	struct acpi_device *acpi_dev;
> +
> +	const char *name;
> +	const char *identifier;
> +
> +	unsigned int c2c_type;
> +	unsigned int peer_type;
> +	unsigned int socket;
> +	unsigned int nr_inst;
> +	unsigned int nr_peer;
> +	unsigned long peer_insts[C2C_NR_PEER_MAX][BITS_TO_LONGS(C2C_NR_INST_MAX)];
> +	u32 filter_default;
> +
> +	struct nv_c2c_pmu_hw_events hw_events;
> +
> +	cpumask_t associated_cpus;
> +	cpumask_t active_cpu;
> +
> +	struct hlist_node cpuhp_node;
> +
> +	struct attribute **formats;
> +	const struct attribute_group *attr_groups[6];
> +
> +	void __iomem *base_broadcast;
> +	void __iomem *base[C2C_NR_INST_MAX];
> +};
> +
> +#define to_c2c_pmu(p) (container_of(p, struct nv_c2c_pmu, pmu))
> +
> +/* Get event type from perf_event. */
> +static inline u32 get_event_type(struct perf_event *event)
> +{
> +	return (event->attr.config) & C2C_MASK_EVENT;
> +}
> +
> +static inline u32 get_filter_mask(struct perf_event *event)
> +{
> +	u32 filter;
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
> +
> +	filter = ((u32)event->attr.config1) & c2c_pmu->filter_default;
> +	if (filter == 0)
> +		filter = c2c_pmu->filter_default;
> +
> +	return filter;
> +}
> +
> +/* PMU operations. */
> +
> +static int nv_c2c_pmu_get_event_idx(struct nv_c2c_pmu_hw_events *hw_events,
> +				    struct perf_event *event)
> +{
> +	u32 idx;
> +
> +	idx = find_first_zero_bit(hw_events->used_ctrs, C2C_MAX_ACTIVE_EVENTS);
> +	if (idx >= C2C_MAX_ACTIVE_EVENTS)
> +		return -EAGAIN;
> +
> +	set_bit(idx, hw_events->used_ctrs);
> +
> +	return idx;
> +}
> +
> +static bool
> +nv_c2c_pmu_validate_event(struct pmu *pmu,
> +			  struct nv_c2c_pmu_hw_events *hw_events,
> +			  struct perf_event *event)
> +{
> +	if (is_software_event(event))
> +		return true;
> +
> +	/* Reject groups spanning multiple HW PMUs. */
> +	if (event->pmu != pmu)
> +		return false;
> +
> +	return nv_c2c_pmu_get_event_idx(hw_events, event) >= 0;
> +}
> +
> +/*
> + * Make sure the group of events can be scheduled at once
> + * on the PMU.
> + */
> +static bool nv_c2c_pmu_validate_group(struct perf_event *event)
> +{
> +	struct perf_event *sibling, *leader = event->group_leader;
> +	struct nv_c2c_pmu_hw_events fake_hw_events;
> +
> +	if (event->group_leader == event)
> +		return true;
> +
> +	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
> +
> +	if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, leader))
> +		return false;
> +
> +	for_each_sibling_event(sibling, leader) {
> +		if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events,
> +					       sibling))
> +			return false;
> +	}
> +
> +	return nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, event);
> +}
> +
> +static int nv_c2c_pmu_event_init(struct perf_event *event)
> +{
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u32 event_type = get_event_type(event);
> +
> +	if (event->attr.type != event->pmu->type ||
> +	    event_type >= C2C_NUM_EVENTS)
> +		return -ENOENT;
> +
> +	/*
> +	 * Following other "uncore" PMUs, we do not support sampling mode or
> +	 * attach to a task (per-process mode).
> +	 */
> +	if (is_sampling_event(event)) {
> +		dev_dbg(c2c_pmu->pmu.dev, "Can't support sampling events\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
> +		dev_dbg(c2c_pmu->pmu.dev, "Can't support per-task counters\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Make sure the CPU assignment is on one of the CPUs associated with
> +	 * this PMU.
> +	 */
> +	if (!cpumask_test_cpu(event->cpu, &c2c_pmu->associated_cpus)) {
> +		dev_dbg(c2c_pmu->pmu.dev,
> +			"Requested cpu is not associated with the PMU\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Enforce the current active CPU to handle the events in this PMU. */
> +	event->cpu = cpumask_first(&c2c_pmu->active_cpu);
> +	if (event->cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	if (!nv_c2c_pmu_validate_group(event))
> +		return -EINVAL;
> +
> +	hwc->idx = -1;
> +	hwc->config = event_type;
> +
> +	return 0;
> +}
> +
> +/*
> + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> + */
> +static u64 read_reg64_hilohi(const void __iomem *addr, u32 max_poll_count)
> +{
> +	u32 val_lo, val_hi;
> +	u64 val;
> +
> +	/* Use high-low-high sequence to avoid tearing */
> +	do {
> +		if (max_poll_count-- == 0) {
> +			pr_err("NV C2C PMU: timeout hi-low-high sequence\n");
> +			return 0;
> +		}
> +
> +		val_hi = readl(addr + 4);
> +		val_lo = readl(addr);
> +	} while (val_hi != readl(addr + 4));
> +
> +	val = (((u64)val_hi << 32) | val_lo);
> +
> +	return val;
> +}
> +
> +static void nv_c2c_pmu_check_status(struct nv_c2c_pmu *c2c_pmu, u32 instance)
> +{
> +	u32 in_status, out_status;
> +
> +	in_status = readl(c2c_pmu->base[instance] + C2C_IN_STATUS);
> +	out_status = readl(c2c_pmu->base[instance] + C2C_OUT_STATUS);
> +
> +	if (in_status || out_status)
> +		dev_warn(c2c_pmu->dev,
> +			"C2C PMU overflow in: 0x%x, out: 0x%x\n",
> +			in_status, out_status);
> +}
> +
> +static u32 nv_c2c_ctr_offset[C2C_NUM_EVENTS] = {
> +	[C2C_EVENT_CYCLES] = C2C_CYCLE_CNTR,
> +	[C2C_EVENT_IN_RD_CUM_OUTS] = C2C_IN_RD_CUM_OUTS_CNTR,
> +	[C2C_EVENT_IN_RD_REQ] = C2C_IN_RD_REQ_CNTR,
> +	[C2C_EVENT_IN_WR_CUM_OUTS] = C2C_IN_WR_CUM_OUTS_CNTR,
> +	[C2C_EVENT_IN_WR_REQ] = C2C_IN_WR_REQ_CNTR,
> +	[C2C_EVENT_OUT_RD_CUM_OUTS] = C2C_OUT_RD_CUM_OUTS_CNTR,
> +	[C2C_EVENT_OUT_RD_REQ] = C2C_OUT_RD_REQ_CNTR,
> +	[C2C_EVENT_OUT_WR_CUM_OUTS] = C2C_OUT_WR_CUM_OUTS_CNTR,
> +	[C2C_EVENT_OUT_WR_REQ] = C2C_OUT_WR_REQ_CNTR,
> +};
> +
> +static u64 nv_c2c_pmu_read_counter(struct perf_event *event)
> +{
> +	u32 ctr_id, ctr_offset, filter_mask, filter_idx, inst_idx;
> +	unsigned long *inst_mask;
> +	DECLARE_BITMAP(filter_bitmap, C2C_NR_PEER_MAX);
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
> +	u64 val = 0;
> +
> +	filter_mask = get_filter_mask(event);
> +	bitmap_from_arr32(filter_bitmap, &filter_mask, c2c_pmu->nr_peer);
> +
> +	ctr_id = event->hw.config;
> +	ctr_offset = nv_c2c_ctr_offset[ctr_id];
> +
> +	for_each_set_bit(filter_idx, filter_bitmap, c2c_pmu->nr_peer) {
> +		inst_mask = c2c_pmu->peer_insts[filter_idx];
> +		for_each_set_bit(inst_idx, inst_mask, c2c_pmu->nr_inst) {
> +			nv_c2c_pmu_check_status(c2c_pmu, inst_idx);
> +
> +			/*
> +			 * Each instance share same clock and the driver always
> +			 * enables all instances. So we can use the counts from
> +			 * one instance for cycle counter.
> +			 */
> +			if (ctr_id == C2C_EVENT_CYCLES)
> +				return read_reg64_hilohi(
> +					c2c_pmu->base[inst_idx] + ctr_offset,
> +					HILOHI_MAX_POLL);
> +
> +			/*
> +			 * For other events, sum up the counts from all instances.
> +			 */
> +			val += read_reg64_hilohi(
> +				c2c_pmu->base[inst_idx] + ctr_offset,
> +				HILOHI_MAX_POLL);
> +		}
> +	}
> +
> +	return val;
> +}
> +
> +static void nv_c2c_pmu_event_update(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	u64 prev, now;
> +
> +	do {
> +		prev = local64_read(&hwc->prev_count);
> +		now = nv_c2c_pmu_read_counter(event);
> +	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> +
> +	local64_add(now - prev, &event->count);
> +}
> +
> +static void nv_c2c_pmu_start(struct perf_event *event, int pmu_flags)
> +{
> +	event->hw.state = 0;
> +}
> +
> +static void nv_c2c_pmu_stop(struct perf_event *event, int pmu_flags)
> +{
> +	event->hw.state |= PERF_HES_STOPPED;
> +}
> +
> +static int nv_c2c_pmu_add(struct perf_event *event, int flags)
> +{
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
> +	struct nv_c2c_pmu_hw_events *hw_events = &c2c_pmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx;
> +
> +	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &c2c_pmu->associated_cpus)))
> +		return -ENOENT;
> +
> +	idx = nv_c2c_pmu_get_event_idx(hw_events, event);
> +	if (idx < 0)
> +		return idx;
> +
> +	hw_events->events[idx] = event;
> +	hwc->idx = idx;
> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +
> +	if (flags & PERF_EF_START)
> +		nv_c2c_pmu_start(event, PERF_EF_RELOAD);
> +
> +	/* Propagate changes to the userspace mapping. */
> +	perf_event_update_userpage(event);
> +
> +	return 0;
> +}
> +
> +static void nv_c2c_pmu_del(struct perf_event *event, int flags)
> +{
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(event->pmu);
> +	struct nv_c2c_pmu_hw_events *hw_events = &c2c_pmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	nv_c2c_pmu_stop(event, PERF_EF_UPDATE);
> +
> +	hw_events->events[idx] = NULL;
> +
> +	clear_bit(idx, hw_events->used_ctrs);
> +
> +	perf_event_update_userpage(event);
> +}
> +
> +static void nv_c2c_pmu_read(struct perf_event *event)
> +{
> +	nv_c2c_pmu_event_update(event);
> +}
> +
> +static void nv_c2c_pmu_enable(struct pmu *pmu)
> +{
> +	void __iomem *bcast;
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(pmu);
> +
> +	/* Check if any filter is enabled. */
> +	if (bitmap_empty(c2c_pmu->hw_events.used_ctrs, C2C_MAX_ACTIVE_EVENTS))
> +		return;
> +
> +	/* Enable all the counters. */
> +	bcast = c2c_pmu->base_broadcast;
> +	writel(0x1UL, bcast + C2C_CTRL);
> +}
> +
> +static void nv_c2c_pmu_disable(struct pmu *pmu)
> +{
> +	unsigned int idx;
> +	void __iomem *bcast;
> +	struct perf_event *event;
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(pmu);
> +
> +	/* Disable all the counters. */
> +	bcast = c2c_pmu->base_broadcast;
> +	writel(0x0UL, bcast + C2C_CTRL);
> +
> +	/*
> +	 * The counters will start from 0 again on restart.
> +	 * Update the events immediately to avoid losing the counts.
> +	 */
> +	for_each_set_bit(idx, c2c_pmu->hw_events.used_ctrs,
> +			 C2C_MAX_ACTIVE_EVENTS) {
> +		event = c2c_pmu->hw_events.events[idx];
> +
> +		if (!event)
> +			continue;
> +
> +		nv_c2c_pmu_event_update(event);
> +
> +		local64_set(&event->hw.prev_count, 0ULL);
> +	}
> +}
> +
> +/* PMU identifier attribute. */
> +
> +static ssize_t nv_c2c_pmu_identifier_show(struct device *dev,
> +					  struct device_attribute *attr,
> +					  char *page)
> +{
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(dev_get_drvdata(dev));
> +
> +	return sysfs_emit(page, "%s\n", c2c_pmu->identifier);
> +}
> +
> +static struct device_attribute nv_c2c_pmu_identifier_attr =
> +	__ATTR(identifier, 0444, nv_c2c_pmu_identifier_show, NULL);
> +
> +static struct attribute *nv_c2c_pmu_identifier_attrs[] = {
> +	&nv_c2c_pmu_identifier_attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group nv_c2c_pmu_identifier_attr_group = {
> +	.attrs = nv_c2c_pmu_identifier_attrs,
> +};
> +
> +/* Peer attribute. */
> +
> +static ssize_t nv_c2c_pmu_peer_show(struct device *dev,
> +	struct device_attribute *attr,
> +	char *page)
> +{
> +	const char *peer_type[C2C_PEER_TYPE_COUNT] = {
> +		[C2C_PEER_TYPE_CPU] = "cpu",
> +		[C2C_PEER_TYPE_GPU] = "gpu",
> +		[C2C_PEER_TYPE_CXLMEM] = "cxlmem",
> +	};
> +
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(dev_get_drvdata(dev));
> +	return sysfs_emit(page, "nr_%s=%u\n", peer_type[c2c_pmu->peer_type],
> +		c2c_pmu->nr_peer);
> +}
> +
> +static struct device_attribute nv_c2c_pmu_peer_attr =
> +	__ATTR(peer, 0444, nv_c2c_pmu_peer_show, NULL);
> +
> +static struct attribute *nv_c2c_pmu_peer_attrs[] = {
> +	&nv_c2c_pmu_peer_attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group nv_c2c_pmu_peer_attr_group = {
> +	.attrs = nv_c2c_pmu_peer_attrs,
> +};
> +
> +/* Format attributes. */
> +
> +#define NV_C2C_PMU_EXT_ATTR(_name, _func, _config)			\
> +	(&((struct dev_ext_attribute[]){				\
> +		{							\
> +			.attr = __ATTR(_name, 0444, _func, NULL),	\
> +			.var = (void *)_config				\
> +		}							\
> +	})[0].attr.attr)
> +
> +#define NV_C2C_PMU_FORMAT_ATTR(_name, _config) \
> +	NV_C2C_PMU_EXT_ATTR(_name, device_show_string, _config)
> +
> +#define NV_C2C_PMU_FORMAT_EVENT_ATTR \
> +	NV_C2C_PMU_FORMAT_ATTR(event, "config:0-3")
> +
> +static struct attribute *nv_c2c_nvlink_pmu_formats[] = {
> +	NV_C2C_PMU_FORMAT_EVENT_ATTR,
> +	NV_C2C_PMU_FORMAT_ATTR(gpu_mask, "config1:0-1"),
> +	NULL,
> +};
> +
> +static struct attribute *nv_c2c_pmu_formats[] = {
> +	NV_C2C_PMU_FORMAT_EVENT_ATTR,
> +	NULL,
> +};
> +
> +static struct attribute_group *
> +nv_c2c_pmu_alloc_format_attr_group(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	struct attribute_group *format_group;
> +	struct device *dev = c2c_pmu->dev;
> +
> +	format_group =
> +		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> +	if (!format_group)
> +		return NULL;
> +
> +	format_group->name = "format";
> +	format_group->attrs = c2c_pmu->formats;
> +
> +	return format_group;
> +}
> +
> +/* Event attributes. */
> +
> +static ssize_t nv_c2c_pmu_sysfs_event_show(struct device *dev,
> +					   struct device_attribute *attr,
> +					   char *buf)
> +{
> +	struct perf_pmu_events_attr *pmu_attr;
> +
> +	pmu_attr = container_of(attr, typeof(*pmu_attr), attr);
> +	return sysfs_emit(buf, "event=0x%llx\n", pmu_attr->id);
> +}
> +
> +#define NV_C2C_PMU_EVENT_ATTR(_name, _config)	\
> +	PMU_EVENT_ATTR_ID(_name, nv_c2c_pmu_sysfs_event_show, _config)
> +
> +static struct attribute *nv_c2c_pmu_events[] = {
> +	NV_C2C_PMU_EVENT_ATTR(cycles, C2C_EVENT_CYCLES),
> +	NV_C2C_PMU_EVENT_ATTR(in_rd_cum_outs, C2C_EVENT_IN_RD_CUM_OUTS),
> +	NV_C2C_PMU_EVENT_ATTR(in_rd_req, C2C_EVENT_IN_RD_REQ),
> +	NV_C2C_PMU_EVENT_ATTR(in_wr_cum_outs, C2C_EVENT_IN_WR_CUM_OUTS),
> +	NV_C2C_PMU_EVENT_ATTR(in_wr_req, C2C_EVENT_IN_WR_REQ),
> +	NV_C2C_PMU_EVENT_ATTR(out_rd_cum_outs, C2C_EVENT_OUT_RD_CUM_OUTS),
> +	NV_C2C_PMU_EVENT_ATTR(out_rd_req, C2C_EVENT_OUT_RD_REQ),
> +	NV_C2C_PMU_EVENT_ATTR(out_wr_cum_outs, C2C_EVENT_OUT_WR_CUM_OUTS),
> +	NV_C2C_PMU_EVENT_ATTR(out_wr_req, C2C_EVENT_OUT_WR_REQ),
> +	NULL
> +};
> +
> +static umode_t
> +nv_c2c_pmu_event_attr_is_visible(struct kobject *kobj, struct attribute *attr,
> +				 int unused)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(dev_get_drvdata(dev));
> +	struct perf_pmu_events_attr *eattr;
> +
> +	eattr = container_of(attr, typeof(*eattr), attr.attr);
> +
> +	if (c2c_pmu->c2c_type == C2C_TYPE_NVDLINK) {
> +		/* Only incoming reads are available. */
> +		switch (eattr->id) {
> +		case C2C_EVENT_IN_WR_CUM_OUTS:
> +		case C2C_EVENT_IN_WR_REQ:
> +		case C2C_EVENT_OUT_RD_CUM_OUTS:
> +		case C2C_EVENT_OUT_RD_REQ:
> +		case C2C_EVENT_OUT_WR_CUM_OUTS:
> +		case C2C_EVENT_OUT_WR_REQ:
> +			return 0;
> +		default:
> +			return attr->mode;
> +		}
> +	} else {
> +		/* Hide the write events if C2C connected to another SoC. */
> +		if (c2c_pmu->peer_type == C2C_PEER_TYPE_CPU) {
> +			switch (eattr->id) {
> +			case C2C_EVENT_IN_WR_CUM_OUTS:
> +			case C2C_EVENT_IN_WR_REQ:
> +			case C2C_EVENT_OUT_WR_CUM_OUTS:
> +			case C2C_EVENT_OUT_WR_REQ:
> +				return 0;
> +			default:
> +				return attr->mode;
> +			}
> +		}
> +	}
> +
> +	return attr->mode;
> +}
> +
> +static const struct attribute_group nv_c2c_pmu_events_group = {
> +	.name = "events",
> +	.attrs = nv_c2c_pmu_events,
> +	.is_visible = nv_c2c_pmu_event_attr_is_visible,
> +};
> +
> +/* Cpumask attributes. */
> +
> +static ssize_t nv_c2c_pmu_cpumask_show(struct device *dev,
> +				       struct device_attribute *attr, char *buf)
> +{
> +	struct pmu *pmu = dev_get_drvdata(dev);
> +	struct nv_c2c_pmu *c2c_pmu = to_c2c_pmu(pmu);
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	unsigned long mask_id = (unsigned long)eattr->var;
> +	const cpumask_t *cpumask;
> +
> +	switch (mask_id) {
> +	case C2C_ACTIVE_CPU_MASK:
> +		cpumask = &c2c_pmu->active_cpu;
> +		break;
> +	case C2C_ASSOCIATED_CPU_MASK:
> +		cpumask = &c2c_pmu->associated_cpus;
> +		break;
> +	default:
> +		return 0;
> +	}
> +	return cpumap_print_to_pagebuf(true, buf, cpumask);
> +}
> +
> +#define NV_C2C_PMU_CPUMASK_ATTR(_name, _config)			\
> +	NV_C2C_PMU_EXT_ATTR(_name, nv_c2c_pmu_cpumask_show,	\
> +				(unsigned long)_config)
> +
> +static struct attribute *nv_c2c_pmu_cpumask_attrs[] = {
> +	NV_C2C_PMU_CPUMASK_ATTR(cpumask, C2C_ACTIVE_CPU_MASK),
> +	NV_C2C_PMU_CPUMASK_ATTR(associated_cpus, C2C_ASSOCIATED_CPU_MASK),
> +	NULL,
> +};
> +
> +static const struct attribute_group nv_c2c_pmu_cpumask_attr_group = {
> +	.attrs = nv_c2c_pmu_cpumask_attrs,
> +};
> +
> +/* Per PMU device attribute groups. */
> +
> +static int nv_c2c_pmu_alloc_attr_groups(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	const struct attribute_group **attr_groups = c2c_pmu->attr_groups;
> +
> +	attr_groups[0] = nv_c2c_pmu_alloc_format_attr_group(c2c_pmu);
> +	attr_groups[1] = &nv_c2c_pmu_events_group;
> +	attr_groups[2] = &nv_c2c_pmu_cpumask_attr_group;
> +	attr_groups[3] = &nv_c2c_pmu_identifier_attr_group;
> +	attr_groups[4] = &nv_c2c_pmu_peer_attr_group;
> +
> +	if (!attr_groups[0])
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int nv_c2c_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct nv_c2c_pmu *c2c_pmu =
> +		hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node);
> +
> +	if (!cpumask_test_cpu(cpu, &c2c_pmu->associated_cpus))
> +		return 0;
> +
> +	/* If the PMU is already managed, there is nothing to do */
> +	if (!cpumask_empty(&c2c_pmu->active_cpu))
> +		return 0;
> +
> +	/* Use this CPU for event counting */
> +	cpumask_set_cpu(cpu, &c2c_pmu->active_cpu);
> +
> +	return 0;
> +}
> +
> +static int nv_c2c_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	unsigned int dst;
> +
> +	struct nv_c2c_pmu *c2c_pmu =
> +		hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node);
> +
> +	/* Nothing to do if this CPU doesn't own the PMU */
> +	if (!cpumask_test_and_clear_cpu(cpu, &c2c_pmu->active_cpu))
> +		return 0;
> +
> +	/* Choose a new CPU to migrate ownership of the PMU to */
> +	dst = cpumask_any_and_but(&c2c_pmu->associated_cpus,
> +				  cpu_online_mask, cpu);
> +	if (dst >= nr_cpu_ids)
> +		return 0;
> +
> +	/* Use this CPU for event counting */
> +	perf_pmu_migrate_context(&c2c_pmu->pmu, cpu, dst);
> +	cpumask_set_cpu(dst, &c2c_pmu->active_cpu);
> +
> +	return 0;
> +}
> +
> +static int nv_c2c_pmu_get_cpus(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	int ret = 0, socket = c2c_pmu->socket, cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (cpu_to_node(cpu) == socket)
> +			cpumask_set_cpu(cpu, &c2c_pmu->associated_cpus);
> +	}
> +
> +	if (cpumask_empty(&c2c_pmu->associated_cpus)) {
> +		dev_dbg(c2c_pmu->dev,
> +			"No cpu associated with C2C PMU socket-%u\n", socket);
> +		ret = -ENODEV;
> +	}
> +
> +	return ret;
> +}
> +
> +static int nv_c2c_pmu_init_socket(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	const char *uid_str;
> +	int ret, socket;
> +
> +	uid_str = acpi_device_uid(c2c_pmu->acpi_dev);
> +	if (!uid_str) {
> +		ret = -ENODEV;
> +		goto fail;
> +	}
> +
> +	ret = kstrtou32(uid_str, 0, &socket);
> +	if (ret)
> +		goto fail;
> +
> +	c2c_pmu->socket = socket;
> +	return 0;
> +
> +fail:
> +	dev_err(c2c_pmu->dev, "Failed to initialize socket\n");
> +	return ret;
> +}
> +
> +static int nv_c2c_pmu_init_id(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	const char *name_fmt[C2C_TYPE_COUNT] = {
> +		[C2C_TYPE_NVLINK] = "nvidia_nvlink_c2c_pmu_%u",
> +		[C2C_TYPE_NVCLINK] = "nvidia_nvclink_pmu_%u",
> +		[C2C_TYPE_NVDLINK] = "nvidia_nvdlink_pmu_%u",
> +	};
> +
> +	char *name;
> +	int ret;
> +
> +	name = devm_kasprintf(c2c_pmu->dev, GFP_KERNEL,
> +		name_fmt[c2c_pmu->c2c_type], c2c_pmu->socket);
> +	if (!name) {
> +		ret = -ENOMEM;
> +		goto fail;
> +	}
> +
> +	c2c_pmu->name = name;
> +
> +	c2c_pmu->identifier = acpi_device_hid(c2c_pmu->acpi_dev);
> +
> +	return 0;
> +
> +fail:
> +	dev_err(c2c_pmu->dev, "Failed to initialize name\n");
> +	return ret;
> +}
> +
> +static int nv_c2c_pmu_init_filter(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	u32 cpu_en = 0;
> +	struct device *dev = c2c_pmu->dev;
> +
> +	if (c2c_pmu->c2c_type == C2C_TYPE_NVDLINK) {
> +		c2c_pmu->peer_type = C2C_PEER_TYPE_CXLMEM;
> +
> +		c2c_pmu->nr_inst = C2C_NR_INST_NVDLINK;
> +		c2c_pmu->peer_insts[0][0] = (1UL << c2c_pmu->nr_inst) - 1;
> +
> +		c2c_pmu->nr_peer = C2C_NR_PEER_CXLMEM;
> +		c2c_pmu->filter_default = (1 << c2c_pmu->nr_peer) - 1;
> +
> +		c2c_pmu->formats = nv_c2c_pmu_formats;
> +
> +		return 0;
> +	}
> +
> +	c2c_pmu->nr_inst = (c2c_pmu->c2c_type == C2C_TYPE_NVLINK) ?
> +		C2C_NR_INST_NVLINK : C2C_NR_INST_NVCLINK;
> +
> +	if (device_property_read_u32(dev, "cpu_en_mask", &cpu_en))
> +		dev_dbg(dev, "no cpu_en_mask property\n");
> +
> +	if (cpu_en) {
> +		c2c_pmu->peer_type = C2C_PEER_TYPE_CPU;
> +
> +		/* Fill peer_insts bitmap with instances connected to peer CPU. */
> +		bitmap_from_arr32(c2c_pmu->peer_insts[0], &cpu_en,
> +				c2c_pmu->nr_inst);
> +
> +		c2c_pmu->nr_peer = 1;
> +		c2c_pmu->formats = nv_c2c_pmu_formats;
> +	} else {
> +		u32 i;
> +		u32 gpu_en = 0;
> +		const char *props[C2C_NR_PEER_MAX] = {
> +			"gpu0_en_mask", "gpu1_en_mask"
> +		};
> +
> +		for (i = 0; i < C2C_NR_PEER_MAX; i++) {
> +			if (device_property_read_u32(dev, props[i], &gpu_en))
> +				dev_dbg(dev, "no %s property\n", props[i]);
> +
> +			if (gpu_en) {
> +				/* Fill peer_insts bitmap with instances connected to peer GPU. */
> +				bitmap_from_arr32(c2c_pmu->peer_insts[i], &gpu_en,
> +						c2c_pmu->nr_inst);
> +
> +				c2c_pmu->nr_peer++;
> +			}
> +		}
> +
> +		if (c2c_pmu->nr_peer == 0) {
> +			dev_err(dev, "No GPU is enabled\n");
> +			return -EINVAL;
> +		}
> +
> +		c2c_pmu->peer_type = C2C_PEER_TYPE_GPU;
> +		c2c_pmu->formats = nv_c2c_nvlink_pmu_formats;
> +	}
> +
> +	c2c_pmu->filter_default = (1 << c2c_pmu->nr_peer) - 1;
> +
> +	return 0;
> +}
> +
> +static void *nv_c2c_pmu_init_pmu(struct platform_device *pdev)
> +{
> +	int ret;
> +	struct nv_c2c_pmu *c2c_pmu;
> +	struct acpi_device *acpi_dev;
> +	struct device *dev = &pdev->dev;
> +
> +	acpi_dev = ACPI_COMPANION(dev);
> +	if (!acpi_dev)
> +		return ERR_PTR(-ENODEV);
> +
> +	c2c_pmu = devm_kzalloc(dev, sizeof(*c2c_pmu), GFP_KERNEL);
> +	if (!c2c_pmu)
> +		return ERR_PTR(-ENOMEM);
> +
> +	c2c_pmu->dev = dev;
> +	c2c_pmu->acpi_dev = acpi_dev;
> +	c2c_pmu->c2c_type = (unsigned int)(unsigned long)device_get_match_data(dev);
> +	platform_set_drvdata(pdev, c2c_pmu);
> +
> +	ret = nv_c2c_pmu_init_socket(c2c_pmu);
> +	if (ret)
> +		goto done;
> +
> +	ret = nv_c2c_pmu_init_id(c2c_pmu);
> +	if (ret)
> +		goto done;
> +
> +	ret = nv_c2c_pmu_init_filter(c2c_pmu);
> +	if (ret)
> +		goto done;
> +
> +done:
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return c2c_pmu;
> +}
> +
> +static int nv_c2c_pmu_init_mmio(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	int i;
> +	struct device *dev = c2c_pmu->dev;
> +	struct platform_device *pdev = to_platform_device(dev);
> +
> +	/* Map the address of all the instances. */
> +	for (i = 0; i < c2c_pmu->nr_inst; i++) {
> +		c2c_pmu->base[i] = devm_platform_ioremap_resource(pdev, i);
> +		if (IS_ERR(c2c_pmu->base[i])) {
> +			dev_err(dev, "Failed map address for instance %d\n", i);
> +			return PTR_ERR(c2c_pmu->base[i]);
> +		}
> +	}
> +
> +	/* Map broadcast address. */
> +	c2c_pmu->base_broadcast = devm_platform_ioremap_resource(pdev,
> +								 c2c_pmu->nr_inst);
> +	if (IS_ERR(c2c_pmu->base_broadcast)) {
> +		dev_err(dev, "Failed map broadcast address\n");
> +		return PTR_ERR(c2c_pmu->base_broadcast);
> +	}
> +
> +	return 0;
> +}
> +
> +static int nv_c2c_pmu_register_pmu(struct nv_c2c_pmu *c2c_pmu)
> +{
> +	int ret;
> +
> +	ret = cpuhp_state_add_instance(nv_c2c_pmu_cpuhp_state,
> +				       &c2c_pmu->cpuhp_node);
> +	if (ret) {
> +		dev_err(c2c_pmu->dev, "Error %d registering hotplug\n", ret);
> +		return ret;
> +	}
> +
> +	c2c_pmu->pmu = (struct pmu) {
> +		.parent		= c2c_pmu->dev,
> +		.task_ctx_nr	= perf_invalid_context,
> +		.pmu_enable	= nv_c2c_pmu_enable,
> +		.pmu_disable	= nv_c2c_pmu_disable,
> +		.event_init	= nv_c2c_pmu_event_init,
> +		.add		= nv_c2c_pmu_add,
> +		.del		= nv_c2c_pmu_del,
> +		.start		= nv_c2c_pmu_start,
> +		.stop		= nv_c2c_pmu_stop,
> +		.read		= nv_c2c_pmu_read,
> +		.attr_groups	= c2c_pmu->attr_groups,
> +		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE |
> +					PERF_PMU_CAP_NO_INTERRUPT,
> +	};
> +
> +	ret = perf_pmu_register(&c2c_pmu->pmu, c2c_pmu->name, -1);
> +	if (ret) {
> +		dev_err(c2c_pmu->dev, "Failed to register C2C PMU: %d\n", ret);
> +		cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state,
> +					  &c2c_pmu->cpuhp_node);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int nv_c2c_pmu_probe(struct platform_device *pdev)
> +{
> +	int ret;
> +	struct nv_c2c_pmu *c2c_pmu;
> +
> +	c2c_pmu = nv_c2c_pmu_init_pmu(pdev);
> +	if (IS_ERR(c2c_pmu))
> +		return PTR_ERR(c2c_pmu);
> +
> +	ret = nv_c2c_pmu_init_mmio(c2c_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = nv_c2c_pmu_get_cpus(c2c_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = nv_c2c_pmu_alloc_attr_groups(c2c_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = nv_c2c_pmu_register_pmu(c2c_pmu);
> +	if (ret)
> +		return ret;
> +
> +	dev_dbg(c2c_pmu->dev, "Registered %s PMU\n", c2c_pmu->name);
> +
> +	return 0;
> +}
> +
> +static void nv_c2c_pmu_device_remove(struct platform_device *pdev)
> +{
> +	struct nv_c2c_pmu *c2c_pmu = platform_get_drvdata(pdev);
> +
> +	perf_pmu_unregister(&c2c_pmu->pmu);
> +	cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state, &c2c_pmu->cpuhp_node);
> +}
> +
> +static const struct acpi_device_id nv_c2c_pmu_acpi_match[] = {
> +	{ "NVDA2023", (kernel_ulong_t)C2C_TYPE_NVLINK },
> +	{ "NVDA2022", (kernel_ulong_t)C2C_TYPE_NVCLINK },
> +	{ "NVDA2020", (kernel_ulong_t)C2C_TYPE_NVDLINK },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(acpi, nv_c2c_pmu_acpi_match);
> +
> +static struct platform_driver nv_c2c_pmu_driver = {
> +	.driver = {
> +		.name = "nvidia-t410-c2c-pmu",
> +		.acpi_match_table = ACPI_PTR(nv_c2c_pmu_acpi_match),
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe = nv_c2c_pmu_probe,
> +	.remove = nv_c2c_pmu_device_remove,
> +};
> +
> +static int __init nv_c2c_pmu_init(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> +				      "perf/nvidia/c2c:online",
> +				      nv_c2c_pmu_online_cpu,
> +				      nv_c2c_pmu_cpu_teardown);
> +	if (ret < 0)
> +		return ret;
> +
> +	nv_c2c_pmu_cpuhp_state = ret;
> +	return platform_driver_register(&nv_c2c_pmu_driver);
> +}
> +
> +static void __exit nv_c2c_pmu_exit(void)
> +{
> +	platform_driver_unregister(&nv_c2c_pmu_driver);
> +	cpuhp_remove_multi_state(nv_c2c_pmu_cpuhp_state);
> +}
> +
> +module_init(nv_c2c_pmu_init);
> +module_exit(nv_c2c_pmu_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("NVIDIA Tegra410 C2C PMU driver");
> +MODULE_AUTHOR("Besar Wicaksono <bwicaksono@nvidia.com>");
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-01-26 18:11 ` [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU Besar Wicaksono
@ 2026-02-08 11:18   ` Krzysztof Kozlowski
  2026-02-10 16:05     ` Besar Wicaksono
  0 siblings, 1 reply; 26+ messages in thread
From: Krzysztof Kozlowski @ 2026-02-08 11:18 UTC (permalink / raw)
  To: Besar Wicaksono, will, suzuki.poulose, robin.murphy, ilkka
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, mark.rutland,
	treding, jonathanh, vsethi, rwiley, sdonthineni, skelley, ywan,
	mochs, nirmoyd

On 26/01/2026 19:11, Besar Wicaksono wrote:
> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.

Why? Why do we want it? Which *upstream board* uses it?


Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
  2026-01-29 22:40   ` Ilkka Koskinen
@ 2026-02-10 16:01     ` Besar Wicaksono
  0 siblings, 0 replies; 26+ messages in thread
From: Besar Wicaksono @ 2026-02-10 16:01 UTC (permalink / raw)
  To: Ilkka Koskinen
  Cc: will@kernel.org, suzuki.poulose@arm.com, robin.murphy@arm.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Jon Hunter, Vikram Sethi,
	Rich Wiley, Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs,
	Nirmoy Das



> -----Original Message-----
> From: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> Sent: Thursday, January 29, 2026 4:40 PM
> To: Besar Wicaksono <bwicaksono@nvidia.com>
> Cc: will@kernel.org; suzuki.poulose@arm.com; robin.murphy@arm.com;
> ilkka@os.amperecomputing.com; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-tegra@vger.kernel.org; mark.rutland@arm.com;
> Thierry Reding <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>;
> Vikram Sethi <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker
> Donthineni <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>;
> Yifei Wan <ywan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
> <nirmoyd@nvidia.com>
> Subject: Re: [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT
> PMU
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Besar,
> 
> On Mon, 26 Jan 2026, Besar Wicaksono wrote:
> > Adds PCIE-TGT PMU support in Tegra410 SOC.
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> > .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  76 ++++
> > drivers/perf/arm_cspmu/nvidia_cspmu.c         | 324 ++++++++++++++++++
> > 2 files changed, 400 insertions(+)
> >
> > diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c
> b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> > index 3a5531d1f94c..095d2f322c6f 100644
> > --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c
> > +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> 
> <snip>
> 
> > +static void pcie_tgt_pmu_reset_ev_filter(struct arm_cspmu *cspmu,
> > +                                  const struct perf_event *event)
> > +{
> > +     bool addr_filter_en;
> > +     u64 base, mask;
> > +     int idx;
> > +
> > +     addr_filter_en = pcie_tgt_pmu_addr_en(event);
> > +     if (!addr_filter_en)
> > +             return;
> > +
> > +     base = pcie_tgt_pmu_dst_addr_base(event);
> > +     mask = pcie_tgt_pmu_dst_addr_mask(event);
> > +     idx = pcie_tgt_find_addr_idx(cspmu, base, mask, true);
> > +
> > +     if (idx < 0) {
> > +             dev_err(cspmu->dev,
> > +                     "Unable to find the address filter slot to reset\n");
> > +             return;
> > +     }
> > +
> > +     pcie_tgt_pmu_config_addr_filter(
> > +                     cspmu, false, base, mask, idx);
> 
> I think you can fit the arguments in the same line with the function name
> 
> Otherwise, looks good to me.
> 
>         Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> 
> Cheers, Ilkka
> 

Thanks Ilkka. I will incorporate your comments (on the other patches as well)
on V2.

Regards,
Besar


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-08 11:18   ` Krzysztof Kozlowski
@ 2026-02-10 16:05     ` Besar Wicaksono
  2026-02-10 16:15       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-02-10 16:05 UTC (permalink / raw)
  To: Krzysztof Kozlowski, will@kernel.org, suzuki.poulose@arm.com,
	robin.murphy@arm.com, ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Jon Hunter, Vikram Sethi,
	Rich Wiley, Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs,
	Nirmoy Das



> -----Original Message-----
> From: Krzysztof Kozlowski <krzk@kernel.org>
> Sent: Sunday, February 8, 2026 5:18 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; will@kernel.org;
> suzuki.poulose@arm.com; robin.murphy@arm.com;
> ilkka@os.amperecomputing.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; mark.rutland@arm.com; Thierry Reding
> <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Vikram Sethi
> <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker Donthineni
> <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>; Yifei Wan
> <ywan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
> <nirmoyd@nvidia.com>
> Subject: Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
> 
> External email: Use caution opening links or attachments
> 
> 
> On 26/01/2026 19:11, Besar Wicaksono wrote:
> > Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
> 
> Why? Why do we want it? Which *upstream board* uses it?
> 

These are for NVIDIA Vera platform (Tegra410 SoC).

Regards,
Besar


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-10 16:05     ` Besar Wicaksono
@ 2026-02-10 16:15       ` Krzysztof Kozlowski
  2026-02-10 17:42         ` Besar Wicaksono
  0 siblings, 1 reply; 26+ messages in thread
From: Krzysztof Kozlowski @ 2026-02-10 16:15 UTC (permalink / raw)
  To: Besar Wicaksono, will@kernel.org, suzuki.poulose@arm.com,
	robin.murphy@arm.com, ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Jon Hunter, Vikram Sethi,
	Rich Wiley, Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs,
	Nirmoy Das

On 10/02/2026 17:05, Besar Wicaksono wrote:
> 
> 
>> -----Original Message-----
>> From: Krzysztof Kozlowski <krzk@kernel.org>
>> Sent: Sunday, February 8, 2026 5:18 AM
>> To: Besar Wicaksono <bwicaksono@nvidia.com>; will@kernel.org;
>> suzuki.poulose@arm.com; robin.murphy@arm.com;
>> ilkka@os.amperecomputing.com
>> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
>> tegra@vger.kernel.org; mark.rutland@arm.com; Thierry Reding
>> <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Vikram Sethi
>> <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker Donthineni
>> <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>; Yifei Wan
>> <ywan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
>> <nirmoyd@nvidia.com>
>> Subject: Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 26/01/2026 19:11, Besar Wicaksono wrote:
>>> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
>>
>> Why? Why do we want it? Which *upstream board* uses it?
>>
> 
> These are for NVIDIA Vera platform (Tegra410 SoC).

There is no such board (git grep), but anyway, don't explain it to me.
Your commit should explain such stuff.

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-10 16:15       ` Krzysztof Kozlowski
@ 2026-02-10 17:42         ` Besar Wicaksono
  2026-02-11  6:19           ` Krzysztof Kozlowski
  0 siblings, 1 reply; 26+ messages in thread
From: Besar Wicaksono @ 2026-02-10 17:42 UTC (permalink / raw)
  To: Krzysztof Kozlowski, will@kernel.org, suzuki.poulose@arm.com,
	robin.murphy@arm.com, ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Jon Hunter, Vikram Sethi,
	Rich Wiley, Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs,
	Nirmoy Das



> -----Original Message-----
> From: Krzysztof Kozlowski <krzk@kernel.org>
> Sent: Tuesday, February 10, 2026 10:15 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; will@kernel.org;
> suzuki.poulose@arm.com; robin.murphy@arm.com;
> ilkka@os.amperecomputing.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; mark.rutland@arm.com; Thierry Reding
> <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Vikram Sethi
> <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker Donthineni
> <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>; Yifei Wan
> <YWan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
> <nirmoyd@nvidia.com>
> Subject: Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/02/2026 17:05, Besar Wicaksono wrote:
> >
> >
> >> -----Original Message-----
> >> From: Krzysztof Kozlowski <krzk@kernel.org>
> >> Sent: Sunday, February 8, 2026 5:18 AM
> >> To: Besar Wicaksono <bwicaksono@nvidia.com>; will@kernel.org;
> >> suzuki.poulose@arm.com; robin.murphy@arm.com;
> >> ilkka@os.amperecomputing.com
> >> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-
> >> tegra@vger.kernel.org; mark.rutland@arm.com; Thierry Reding
> >> <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Vikram Sethi
> >> <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker
> Donthineni
> >> <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>; Yifei Wan
> >> <ywan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
> >> <nirmoyd@nvidia.com>
> >> Subject: Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 26/01/2026 19:11, Besar Wicaksono wrote:
> >>> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
> >>
> >> Why? Why do we want it? Which *upstream board* uses it?
> >>
> >
> > These are for NVIDIA Vera platform (Tegra410 SoC).
> 
> There is no such board (git grep), but anyway, don't explain it to me.
> Your commit should explain such stuff.
> 

This is a server platform. There is no upstream board using this device currently.
Please let me know if the patch is acceptable and I will update the commit message
with more clarity. Otherwise, I will drop the patch.

Thanks,
Besar 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-10 17:42         ` Besar Wicaksono
@ 2026-02-11  6:19           ` Krzysztof Kozlowski
  2026-02-11 10:36             ` Jon Hunter
  0 siblings, 1 reply; 26+ messages in thread
From: Krzysztof Kozlowski @ 2026-02-11  6:19 UTC (permalink / raw)
  To: Besar Wicaksono, will@kernel.org, suzuki.poulose@arm.com,
	robin.murphy@arm.com, ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Jon Hunter, Vikram Sethi,
	Rich Wiley, Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs,
	Nirmoy Das

On 10/02/2026 18:42, Besar Wicaksono wrote:
> 
> 
>> -----Original Message-----
>> From: Krzysztof Kozlowski <krzk@kernel.org>
>> Sent: Tuesday, February 10, 2026 10:15 AM
>> To: Besar Wicaksono <bwicaksono@nvidia.com>; will@kernel.org;
>> suzuki.poulose@arm.com; robin.murphy@arm.com;
>> ilkka@os.amperecomputing.com
>> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
>> tegra@vger.kernel.org; mark.rutland@arm.com; Thierry Reding
>> <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Vikram Sethi
>> <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker Donthineni
>> <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>; Yifei Wan
>> <YWan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
>> <nirmoyd@nvidia.com>
>> Subject: Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 10/02/2026 17:05, Besar Wicaksono wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Krzysztof Kozlowski <krzk@kernel.org>
>>>> Sent: Sunday, February 8, 2026 5:18 AM
>>>> To: Besar Wicaksono <bwicaksono@nvidia.com>; will@kernel.org;
>>>> suzuki.poulose@arm.com; robin.murphy@arm.com;
>>>> ilkka@os.amperecomputing.com
>>>> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
>> linux-
>>>> tegra@vger.kernel.org; mark.rutland@arm.com; Thierry Reding
>>>> <treding@nvidia.com>; Jon Hunter <jonathanh@nvidia.com>; Vikram Sethi
>>>> <vsethi@nvidia.com>; Rich Wiley <rwiley@nvidia.com>; Shanker
>> Donthineni
>>>> <sdonthineni@nvidia.com>; Sean Kelley <skelley@nvidia.com>; Yifei Wan
>>>> <ywan@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Nirmoy Das
>>>> <nirmoyd@nvidia.com>
>>>> Subject: Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 26/01/2026 19:11, Besar Wicaksono wrote:
>>>>> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
>>>>
>>>> Why? Why do we want it? Which *upstream board* uses it?
>>>>
>>>
>>> These are for NVIDIA Vera platform (Tegra410 SoC).
>>
>> There is no such board (git grep), but anyway, don't explain it to me.
>> Your commit should explain such stuff.
>>
> 
> This is a server platform. There is no upstream board using this device currently.

I don't understand why server or not server matters.

If there is no upstream user of this, it is wrong to add it to
defconfig. This is upstream defconfig, not your distro.


Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-11  6:19           ` Krzysztof Kozlowski
@ 2026-02-11 10:36             ` Jon Hunter
  2026-02-11 10:41               ` Krzysztof Kozlowski
  0 siblings, 1 reply; 26+ messages in thread
From: Jon Hunter @ 2026-02-11 10:36 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Besar Wicaksono, will@kernel.org,
	suzuki.poulose@arm.com, robin.murphy@arm.com,
	ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Vikram Sethi, Rich Wiley,
	Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs, Nirmoy Das


On 11/02/2026 06:19, Krzysztof Kozlowski wrote:

...

>>>>>> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
>>>>>
>>>>> Why? Why do we want it? Which *upstream board* uses it?
>>>>>
>>>>
>>>> These are for NVIDIA Vera platform (Tegra410 SoC).
>>>
>>> There is no such board (git grep), but anyway, don't explain it to me.
>>> Your commit should explain such stuff.
>>>
>>
>> This is a server platform. There is no upstream board using this device currently.
> 
> I don't understand why server or not server matters.

We should probably say this is an ACPI based platform.

> If there is no upstream user of this, it is wrong to add it to
> defconfig. This is upstream defconfig, not your distro.

This device does not support device-tree (so far) and hence, you will 
not see any device-tree for this. However, enabling this is still valid 
from an ACPI perspective.

Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-11 10:36             ` Jon Hunter
@ 2026-02-11 10:41               ` Krzysztof Kozlowski
  2026-02-11 10:47                 ` Jon Hunter
  0 siblings, 1 reply; 26+ messages in thread
From: Krzysztof Kozlowski @ 2026-02-11 10:41 UTC (permalink / raw)
  To: Jon Hunter, Besar Wicaksono, will@kernel.org,
	suzuki.poulose@arm.com, robin.murphy@arm.com,
	ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Vikram Sethi, Rich Wiley,
	Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs, Nirmoy Das

On 11/02/2026 11:36, Jon Hunter wrote:
> 
> On 11/02/2026 06:19, Krzysztof Kozlowski wrote:
> 
> ...
> 
>>>>>>> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
>>>>>>
>>>>>> Why? Why do we want it? Which *upstream board* uses it?
>>>>>>
>>>>>
>>>>> These are for NVIDIA Vera platform (Tegra410 SoC).
>>>>
>>>> There is no such board (git grep), but anyway, don't explain it to me.
>>>> Your commit should explain such stuff.
>>>>
>>>
>>> This is a server platform. There is no upstream board using this device currently.
>>
>> I don't understand why server or not server matters.
> 
> We should probably say this is an ACPI based platform.

Yeah, this would be fine so you have entire commit msg to explain why we
want this patch.

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU
  2026-02-11 10:41               ` Krzysztof Kozlowski
@ 2026-02-11 10:47                 ` Jon Hunter
  0 siblings, 0 replies; 26+ messages in thread
From: Jon Hunter @ 2026-02-11 10:47 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Besar Wicaksono, will@kernel.org,
	suzuki.poulose@arm.com, robin.murphy@arm.com,
	ilkka@os.amperecomputing.com
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	mark.rutland@arm.com, Thierry Reding, Vikram Sethi, Rich Wiley,
	Shanker Donthineni, Sean Kelley, Yifei Wan, Matt Ochs, Nirmoy Das


On 11/02/2026 10:41, Krzysztof Kozlowski wrote:
> On 11/02/2026 11:36, Jon Hunter wrote:
>>
>> On 11/02/2026 06:19, Krzysztof Kozlowski wrote:
>>
>> ...
>>
>>>>>>>> Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.
>>>>>>>
>>>>>>> Why? Why do we want it? Which *upstream board* uses it?
>>>>>>>
>>>>>>
>>>>>> These are for NVIDIA Vera platform (Tegra410 SoC).
>>>>>
>>>>> There is no such board (git grep), but anyway, don't explain it to me.
>>>>> Your commit should explain such stuff.
>>>>>
>>>>
>>>> This is a server platform. There is no upstream board using this device currently.
>>>
>>> I don't understand why server or not server matters.
>>
>> We should probably say this is an ACPI based platform.
> 
> Yeah, this would be fine so you have entire commit msg to explain why we
> want this patch.

Ha! A good commit message always helps indeed!

Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-02-11 10:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-26 18:11 [PATCH 0/8] perf: add NVIDIA Tegra410 Uncore PMU support Besar Wicaksono
2026-01-26 18:11 ` [PATCH 1/8] perf/arm_cspmu: nvidia: Rename doc to Tegra241 Besar Wicaksono
2026-01-26 18:11 ` [PATCH 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU Besar Wicaksono
2026-01-29 22:20   ` Ilkka Koskinen
2026-01-26 18:11 ` [PATCH 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get Besar Wicaksono
2026-01-29 22:23   ` Ilkka Koskinen
2026-01-26 18:11 ` [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Besar Wicaksono
2026-01-28 15:56   ` kernel test robot
2026-01-29 22:34   ` Ilkka Koskinen
2026-01-26 18:11 ` [PATCH 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU Besar Wicaksono
2026-01-29 22:40   ` Ilkka Koskinen
2026-02-10 16:01     ` Besar Wicaksono
2026-01-26 18:11 ` [PATCH 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU Besar Wicaksono
2026-01-28  0:40   ` kernel test robot
2026-01-30  2:09   ` Ilkka Koskinen
2026-01-26 18:11 ` [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU Besar Wicaksono
2026-01-30  2:54   ` Ilkka Koskinen
2026-01-26 18:11 ` [PATCH 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU Besar Wicaksono
2026-02-08 11:18   ` Krzysztof Kozlowski
2026-02-10 16:05     ` Besar Wicaksono
2026-02-10 16:15       ` Krzysztof Kozlowski
2026-02-10 17:42         ` Besar Wicaksono
2026-02-11  6:19           ` Krzysztof Kozlowski
2026-02-11 10:36             ` Jon Hunter
2026-02-11 10:41               ` Krzysztof Kozlowski
2026-02-11 10:47                 ` Jon Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox