linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2
@ 2024-08-27 16:44 Leo Yan
  2024-08-27 16:44 ` [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity Leo Yan
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

This patch series enhances Arm SPE metadata in the Perf file to a
version 2 format.

The version 2 metadata contains a global structure with fields for CPU
numbers and the metadata version number. It also expands to store
per-CPU metadata, which includes the CPU logical number in the kernel,
MIDR, and capacity values associated with Arm SPE.

This set exposes a new sysfs node, ‘lds’ (Load Data Source), which
indicates if the Arm SPE supports data source packets. This flag will be
used to decide if the data source packet needs to be decoded during the
report phase, which will be sent out in a separate series for data
source decoding refactoring.

This patch series need to be applied on the top of the changes:
https://lore.kernel.org/linux-perf-users/20240823113306.2310957-1-leo.yan@arm.com/T/#t


Leo Yan (9):
  perf: arm_spe: Introduce 'lds' capacity
  perf auxtrace arm: Refactor error handling
  perf auxtrace arm: Introduce find_auxtrace_pmus_by_name()
  perf: arm-spe: Record multiple PMUs
  perf arm-spe: Extend meta data header for version 2
  perf arm-spe: Calculate meta data size
  perf arm-spe: Save per CPU information in metadata
  perf arm-spe: Support metadata version 2
  perf arm-spe: Dump metadata with version 2

 drivers/perf/arm_spe_pmu.c           |   3 +
 tools/perf/arch/arm/util/auxtrace.c  | 191 +++++++++------------------
 tools/perf/arch/arm64/util/arm-spe.c | 126 +++++++++++++++++-
 tools/perf/util/arm-spe.c            | 119 ++++++++++++++++-
 tools/perf/util/arm-spe.h            |  37 +++++-
 5 files changed, 328 insertions(+), 148 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-30 10:38   ` Will Deacon
  2024-08-27 16:44 ` [PATCH v1 2/9] perf auxtrace arm: Refactor error handling Leo Yan
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

This commit adds a new entry 'lds' in the capacity folder. 'lds' stands
for "loaded data source". When its value is 1, it indicates the data
source implemented, and data source packets will be recorded in the
trace data.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 drivers/perf/arm_spe_pmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 9100d82bfabc..81c1e7627721 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action {
 /* This sysfs gunk was really good fun to write. */
 enum arm_spe_pmu_capabilities {
 	SPE_PMU_CAP_ARCH_INST = 0,
+	SPE_PMU_CAP_LDS,
 	SPE_PMU_CAP_ERND,
 	SPE_PMU_CAP_FEAT_MAX,
 	SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
@@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities {
 
 static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
 	[SPE_PMU_CAP_ARCH_INST]	= SPE_PMU_FEAT_ARCH_INST,
+	[SPE_PMU_CAP_LDS]	= SPE_PMU_FEAT_LDS,
 	[SPE_PMU_CAP_ERND]	= SPE_PMU_FEAT_ERND,
 };
 
@@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
 
 static struct attribute *arm_spe_pmu_cap_attr[] = {
 	SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
+	SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS),
 	SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
 	SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
 	SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 2/9] perf auxtrace arm: Refactor error handling
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
  2024-08-27 16:44 ` [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 3/9] perf auxtrace arm: Introduce find_auxtrace_pmus_by_name() Leo Yan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

Refactor the auxtrace_record__init() function to use a central place to
release resources and return value. This unifies the exit flow, and
allows the PMU arrays can be used throughout the function.

No functional changes; this is a preparation for sequential changes.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm/util/auxtrace.c | 35 ++++++++++++++++-------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 3b8eca0ffb17..74630d2d81dc 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -125,6 +125,7 @@ struct auxtrace_record
 	struct perf_pmu *found_etm = NULL;
 	struct perf_pmu *found_spe = NULL;
 	struct perf_pmu *found_ptt = NULL;
+	struct auxtrace_record *itr = NULL;
 	int auxtrace_event_cnt = 0;
 	int nr_spes = 0;
 	int nr_ptts = 0;
@@ -147,9 +148,6 @@ struct auxtrace_record
 			found_ptt = find_pmu_for_event(hisi_ptt_pmus, nr_ptts, evsel);
 	}
 
-	free(arm_spe_pmus);
-	free(hisi_ptt_pmus);
-
 	if (found_etm)
 		auxtrace_event_cnt++;
 
@@ -159,31 +157,36 @@ struct auxtrace_record
 	if (found_ptt)
 		auxtrace_event_cnt++;
 
-	if (auxtrace_event_cnt > 1) {
+	if (!auxtrace_event_cnt) {
+		/*
+		 * Clear 'err' even if we haven't found an event - that way perf
+		 * record can still be used even if tracers aren't present.
+		 * The NULL return value will take care of telling the
+		 * infrastructure HW tracing isn't available.
+		 */
+		*err = 0;
+		goto out;
+	} else if (auxtrace_event_cnt > 1) {
 		pr_err("Concurrent AUX trace operation not currently supported\n");
 		*err = -EOPNOTSUPP;
-		return NULL;
+		goto out;
 	}
 
 	if (found_etm)
-		return cs_etm_record_init(err);
+		itr = cs_etm_record_init(err);
 
 #if defined(__aarch64__)
 	if (found_spe)
-		return arm_spe_recording_init(err, found_spe);
+		itr = arm_spe_recording_init(err, found_spe);
 
 	if (found_ptt)
-		return hisi_ptt_recording_init(err, found_ptt);
+		itr = hisi_ptt_recording_init(err, found_ptt);
 #endif
 
-	/*
-	 * Clear 'err' even if we haven't found an event - that way perf
-	 * record can still be used even if tracers aren't present.  The NULL
-	 * return value will take care of telling the infrastructure HW tracing
-	 * isn't available.
-	 */
-	*err = 0;
-	return NULL;
+out:
+	free(arm_spe_pmus);
+	free(hisi_ptt_pmus);
+	return itr;
 }
 
 #if defined(__arm__)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 3/9] perf auxtrace arm: Introduce find_auxtrace_pmus_by_name()
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
  2024-08-27 16:44 ` [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity Leo Yan
  2024-08-27 16:44 ` [PATCH v1 2/9] perf auxtrace arm: Refactor error handling Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 4/9] perf: arm-spe: Record multiple PMUs Leo Yan
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

Arm auxtrace searches the opened PMU events for Arm SPE and Hisilicon
HTT. The current approach is to first iterate all PMU devices under the
sysfs folder and then match the PMU event on the evlist.

Since the evlist has sufficient info for the PMU name and corresponding
PMU pointer, it is no need to browse the PMU devices from sysfs nodes.
Alternatively, by traversing the evlist and comparing prefixes for PMU
names, we can directly gather the PMU pointers and save them into an
array. Following the idea, this patch introduces a new function
find_auxtrace_pmus_by_name().

find_auxtrace_pmus_by_name() returns a PMU pointer array or NULL if no
any PMU is found. This simplifies the auxtrace_record__init() function,
as the PMU array pointers are for found PMU events. The local variables
'found_{etm|spe|ptt}' and relevant code are redundant, so remove them.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm/util/auxtrace.c | 158 ++++++++--------------------
 1 file changed, 41 insertions(+), 117 deletions(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 74630d2d81dc..2fca16659858 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -7,6 +7,7 @@
 #include <dirent.h>
 #include <stdbool.h>
 #include <linux/coresight-pmu.h>
+#include <linux/string.h>
 #include <linux/zalloc.h>
 #include <api/fs/fs.h>
 
@@ -19,144 +20,66 @@
 #include "arm-spe.h"
 #include "hisi-ptt.h"
 
-static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
+static struct perf_pmu **
+find_auxtrace_pmus_by_name(struct evlist *evlist, const char *name, int *nr_pmu)
 {
-	struct perf_pmu **arm_spe_pmus = NULL;
-	int ret, i, nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
-	/* arm_spe_xxxxxxxxx\0 */
-	char arm_spe_pmu_name[sizeof(ARM_SPE_PMU_NAME) + 10];
-
-	arm_spe_pmus = zalloc(sizeof(struct perf_pmu *) * nr_cpus);
-	if (!arm_spe_pmus) {
-		pr_err("spes alloc failed\n");
-		*err = -ENOMEM;
-		return NULL;
-	}
-
-	for (i = 0; i < nr_cpus; i++) {
-		ret = sprintf(arm_spe_pmu_name, "%s%d", ARM_SPE_PMU_NAME, i);
-		if (ret < 0) {
-			pr_err("sprintf failed\n");
-			*err = -ENOMEM;
-			return NULL;
-		}
+	struct perf_pmu **pmus = NULL;
+	struct evsel *evsel;
+	int i = 0, nr = 0;
 
-		arm_spe_pmus[*nr_spes] = perf_pmus__find(arm_spe_pmu_name);
-		if (arm_spe_pmus[*nr_spes]) {
-			pr_debug2("%s %d: arm_spe_pmu %d type %d name %s\n",
-				 __func__, __LINE__, *nr_spes,
-				 arm_spe_pmus[*nr_spes]->type,
-				 arm_spe_pmus[*nr_spes]->name);
-			(*nr_spes)++;
-		}
-	}
+	assert(name);
+	assert(nr_pmu);
 
-	return arm_spe_pmus;
-}
+	*nr_pmu = 0;
 
-static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err)
-{
-	struct perf_pmu **hisi_ptt_pmus = NULL;
-	struct dirent *dent;
-	char path[PATH_MAX];
-	DIR *dir = NULL;
-	int idx = 0;
-
-	perf_pmu__event_source_devices_scnprintf(path, sizeof(path));
-	dir = opendir(path);
-	if (!dir) {
-		pr_err("can't read directory '%s'\n", path);
-		*err = -EINVAL;
-		return NULL;
-	}
-
-	while ((dent = readdir(dir))) {
-		if (strstr(dent->d_name, HISI_PTT_PMU_NAME))
-			(*nr_ptts)++;
+	evlist__for_each_entry(evlist, evsel) {
+		if (strstarts(evsel->pmu_name, name))
+			nr++;
 	}
 
-	if (!(*nr_ptts))
-		goto out;
+	if (!nr)
+		return NULL;
 
-	hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts));
-	if (!hisi_ptt_pmus) {
-		pr_err("hisi_ptt alloc failed\n");
-		*err = -ENOMEM;
-		goto out;
+	pmus = zalloc(sizeof(struct perf_pmu *) * nr);
+	if (!pmus) {
+		pr_err("Failed to allocate PMU pointer arrary.\n");
+		return NULL;
 	}
 
-	rewinddir(dir);
-	while ((dent = readdir(dir))) {
-		if (strstr(dent->d_name, HISI_PTT_PMU_NAME) && idx < *nr_ptts) {
-			hisi_ptt_pmus[idx] = perf_pmus__find(dent->d_name);
-			if (hisi_ptt_pmus[idx])
-				idx++;
+	evlist__for_each_entry(evlist, evsel) {
+		if (strstarts(evsel->pmu_name, name)) {
+			pmus[i] = evsel->pmu;
+			i++;
 		}
 	}
 
-out:
-	closedir(dir);
-	return hisi_ptt_pmus;
-}
-
-static struct perf_pmu *find_pmu_for_event(struct perf_pmu **pmus,
-					   int pmu_nr, struct evsel *evsel)
-{
-	int i;
-
-	if (!pmus)
-		return NULL;
-
-	for (i = 0; i < pmu_nr; i++) {
-		if (evsel->core.attr.type == pmus[i]->type)
-			return pmus[i];
-	}
-
-	return NULL;
+	*nr_pmu = nr;
+	return pmus;
 }
 
 struct auxtrace_record
 *auxtrace_record__init(struct evlist *evlist, int *err)
 {
-	struct perf_pmu	*cs_etm_pmu = NULL;
+	struct perf_pmu	**cs_etm_pmu = NULL;
 	struct perf_pmu **arm_spe_pmus = NULL;
 	struct perf_pmu **hisi_ptt_pmus = NULL;
-	struct evsel *evsel;
-	struct perf_pmu *found_etm = NULL;
-	struct perf_pmu *found_spe = NULL;
-	struct perf_pmu *found_ptt = NULL;
 	struct auxtrace_record *itr = NULL;
 	int auxtrace_event_cnt = 0;
-	int nr_spes = 0;
-	int nr_ptts = 0;
+	int nr_etm = 0;
+	int nr_spe = 0;
+	int nr_ptt = 0;
 
 	if (!evlist)
 		return NULL;
 
-	cs_etm_pmu = perf_pmus__find(CORESIGHT_ETM_PMU_NAME);
-	arm_spe_pmus = find_all_arm_spe_pmus(&nr_spes, err);
-	hisi_ptt_pmus = find_all_hisi_ptt_pmus(&nr_ptts, err);
-
-	evlist__for_each_entry(evlist, evsel) {
-		if (cs_etm_pmu && !found_etm)
-			found_etm = find_pmu_for_event(&cs_etm_pmu, 1, evsel);
-
-		if (arm_spe_pmus && !found_spe)
-			found_spe = find_pmu_for_event(arm_spe_pmus, nr_spes, evsel);
-
-		if (hisi_ptt_pmus && !found_ptt)
-			found_ptt = find_pmu_for_event(hisi_ptt_pmus, nr_ptts, evsel);
-	}
-
-	if (found_etm)
-		auxtrace_event_cnt++;
-
-	if (found_spe)
-		auxtrace_event_cnt++;
-
-	if (found_ptt)
-		auxtrace_event_cnt++;
+	cs_etm_pmu =
+		find_auxtrace_pmus_by_name(evlist, CORESIGHT_ETM_PMU_NAME, &nr_etm);
+	arm_spe_pmus =
+		find_auxtrace_pmus_by_name(evlist, ARM_SPE_PMU_NAME, &nr_spe);
+	hisi_ptt_pmus =
+		find_auxtrace_pmus_by_name(evlist, HISI_PTT_PMU_NAME, &nr_ptt);
 
+	auxtrace_event_cnt = !!nr_etm + !!nr_spe + !!nr_ptt;
 	if (!auxtrace_event_cnt) {
 		/*
 		 * Clear 'err' even if we haven't found an event - that way perf
@@ -172,18 +95,19 @@ struct auxtrace_record
 		goto out;
 	}
 
-	if (found_etm)
+	if (cs_etm_pmu)
 		itr = cs_etm_record_init(err);
 
 #if defined(__aarch64__)
-	if (found_spe)
-		itr = arm_spe_recording_init(err, found_spe);
+	if (arm_spe_pmus)
+		itr = arm_spe_recording_init(err, arm_spe_pmus[0]);
 
-	if (found_ptt)
-		itr = hisi_ptt_recording_init(err, found_ptt);
+	if (hisi_ptt_pmus)
+		itr = hisi_ptt_recording_init(err, hisi_ptt_pmus[0]);
 #endif
 
 out:
+	free(cs_etm_pmu);
 	free(arm_spe_pmus);
 	free(hisi_ptt_pmus);
 	return itr;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 4/9] perf: arm-spe: Record multiple PMUs
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
                   ` (2 preceding siblings ...)
  2024-08-27 16:44 ` [PATCH v1 3/9] perf auxtrace arm: Introduce find_auxtrace_pmus_by_name() Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 5/9] perf arm-spe: Extend meta data header for version 2 Leo Yan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

Currently, the arm_spe_recording structure only saves the first Arm SPE
PMU pointer and it cannot cover all PMU events for the multiple Arm SPE
event case.

Save the all Arm SPE PMU pointers into the arm_spe_recording structure,
later changes will use these pointers to retrieve meta data.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm/util/auxtrace.c  |  2 +-
 tools/perf/arch/arm64/util/arm-spe.c | 19 +++++++++++++++----
 tools/perf/util/arm-spe.h            |  3 ++-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 2fca16659858..55877418b6c4 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -100,7 +100,7 @@ struct auxtrace_record
 
 #if defined(__aarch64__)
 	if (arm_spe_pmus)
-		itr = arm_spe_recording_init(err, arm_spe_pmus[0]);
+		itr = arm_spe_recording_init(err, arm_spe_pmus, nr_spe);
 
 	if (hisi_ptt_pmus)
 		itr = hisi_ptt_recording_init(err, hisi_ptt_pmus[0]);
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 2be99fdf997d..7880190c3dd6 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -31,7 +31,8 @@
 
 struct arm_spe_recording {
 	struct auxtrace_record		itr;
-	struct perf_pmu			*arm_spe_pmu;
+	struct perf_pmu			**pmu;
+	int				nr_pmu;
 	struct evlist		*evlist;
 	int			wrapped_cnt;
 	bool			*wrapped;
@@ -51,7 +52,7 @@ static int arm_spe_info_fill(struct auxtrace_record *itr,
 {
 	struct arm_spe_recording *sper =
 			container_of(itr, struct arm_spe_recording, itr);
-	struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
+	struct perf_pmu *arm_spe_pmu = sper->pmu[0];
 
 	if (priv_size != ARM_SPE_AUXTRACE_PRIV_SIZE)
 		return -EINVAL;
@@ -494,11 +495,13 @@ static void arm_spe_recording_free(struct auxtrace_record *itr)
 			container_of(itr, struct arm_spe_recording, itr);
 
 	zfree(&sper->wrapped);
+	zfree(&sper->pmu);
 	free(sper);
 }
 
 struct auxtrace_record *arm_spe_recording_init(int *err,
-					       struct perf_pmu *arm_spe_pmu)
+					       struct perf_pmu **arm_spe_pmu,
+					       int nr_pmu)
 {
 	struct arm_spe_recording *sper;
 
@@ -513,7 +516,15 @@ struct auxtrace_record *arm_spe_recording_init(int *err,
 		return NULL;
 	}
 
-	sper->arm_spe_pmu = arm_spe_pmu;
+	sper->pmu = zalloc(sizeof(struct perf_pmu *) * nr_pmu);
+	if (!sper->pmu) {
+		free(sper);
+		*err = -ENOMEM;
+		return NULL;
+	}
+	memcpy(sper->pmu, arm_spe_pmu, sizeof(struct perf_pmu *) * nr_pmu);
+	sper->nr_pmu = nr_pmu;
+
 	sper->itr.snapshot_start = arm_spe_snapshot_start;
 	sper->itr.snapshot_finish = arm_spe_snapshot_finish;
 	sper->itr.find_snapshot = arm_spe_find_snapshot;
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 4f4900c18f3e..e1327e1b3fec 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -22,7 +22,8 @@ struct perf_session;
 struct perf_pmu;
 
 struct auxtrace_record *arm_spe_recording_init(int *err,
-					       struct perf_pmu *arm_spe_pmu);
+					       struct perf_pmu **arm_spe_pmu,
+					       int nr_pmu);
 
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 5/9] perf arm-spe: Extend meta data header for version 2
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
                   ` (3 preceding siblings ...)
  2024-08-27 16:44 ` [PATCH v1 4/9] perf: arm-spe: Record multiple PMUs Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 6/9] perf arm-spe: Calculate meta data size Leo Yan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

This commit extends the meta data header structure for version 2.

The first version's header structure doesn't include a field to indicate
the header's version, adds a new field for it. And extends to support
per CPU's meta data. Add macros for metadata header size and per CPU
data size, and update the code with these macros.

No functional change.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c |  4 ++--
 tools/perf/util/arm-spe.c            |  2 +-
 tools/perf/util/arm-spe.h            | 34 +++++++++++++++++++++++++++-
 3 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 7880190c3dd6..3e7f62bac2e0 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -42,7 +42,7 @@ static size_t
 arm_spe_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 		       struct evlist *evlist __maybe_unused)
 {
-	return ARM_SPE_AUXTRACE_PRIV_SIZE;
+	return ARM_SPE_AUXTRACE_V1_PRIV_SIZE;
 }
 
 static int arm_spe_info_fill(struct auxtrace_record *itr,
@@ -54,7 +54,7 @@ static int arm_spe_info_fill(struct auxtrace_record *itr,
 			container_of(itr, struct arm_spe_recording, itr);
 	struct perf_pmu *arm_spe_pmu = sper->pmu[0];
 
-	if (priv_size != ARM_SPE_AUXTRACE_PRIV_SIZE)
+	if (priv_size != ARM_SPE_AUXTRACE_V1_PRIV_SIZE)
 		return -EINVAL;
 
 	if (!session->evlist->core.nr_mmaps)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 3f8695fe6a20..1e87342f4bd8 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1260,7 +1260,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
 	struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
-	size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
+	size_t min_sz = ARM_SPE_AUXTRACE_V1_PRIV_SIZE;
 	struct perf_record_time_conv *tc = &session->time_conv;
 	const char *cpuid = perf_env__cpuid(session->evlist->env);
 	u64 midr = strtol(cpuid, NULL, 16);
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index e1327e1b3fec..5d587a8e3cf8 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -12,10 +12,42 @@
 enum {
 	ARM_SPE_PMU_TYPE,
 	ARM_SPE_PER_CPU_MMAPS,
+	/*
+	 * The initial version doesn't have version number, so version 1 is
+	 * reserved and the header version starts from 2.
+	 */
+	ARM_SPE_HEADER_VERSION,
+	ARM_SPE_CPU_NUM,
 	ARM_SPE_AUXTRACE_PRIV_MAX,
 };
 
-#define ARM_SPE_AUXTRACE_PRIV_SIZE (ARM_SPE_AUXTRACE_PRIV_MAX * sizeof(u64))
+enum {
+	ARM_SPE_CPU,
+	ARM_SPE_CPU_MIDR,
+	ARM_SPE_CPU_PMU_TYPE,
+	ARM_SPE_CAP_MIN_IVAL,
+	ARM_SPE_CAP_LDS,
+	ARM_SPE_PER_CPU_PRIV_MAX,
+};
+
+#define ARM_SPE_HEADER_CURRENT_VERSION	2
+
+#define ARM_SPE_METADATA_SIZE(cnt)	((cnt) * sizeof(u64))
+
+#define ARM_SPE_AUXTRACE_V1_PRIV_MAX		\
+	(ARM_SPE_PER_CPU_MMAPS + 1)
+#define ARM_SPE_AUXTRACE_V1_PRIV_SIZE		\
+	ARM_SPE_METADATA_SIZE(ARM_SPE_AUXTRACE_V1_PRIV_MAX)
+
+#define ARM_SPE_AUXTRACE_V2_PRIV_MAX		\
+	(ARM_SPE_CPU_NUM + 1)
+#define ARM_SPE_AUXTRACE_V2_PRIV_SIZE		\
+	ARM_SPE_METADATA_SIZE(ARM_SPE_AUXTRACE_V2_PRIV_MAX)
+
+#define ARM_SPE_AUXTRACE_V2_PRIV_PER_CPU_MAX	\
+	(ARM_SPE_CAP_LDS + 1)
+#define ARM_SPE_AUXTRACE_V2_PER_CPU_SIZE	\
+	ARM_SPE_METADATA_SIZE(ARM_SPE_AUXTRACE_V2_PRIV_PER_CPU_MAX)
 
 union perf_event;
 struct perf_session;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 6/9] perf arm-spe: Calculate meta data size
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
                   ` (4 preceding siblings ...)
  2024-08-27 16:44 ` [PATCH v1 5/9] perf arm-spe: Extend meta data header for version 2 Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 7/9] perf arm-spe: Save per CPU information in metadata Leo Yan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

The metadata is designed to contain a header and per CPU information.

The arm_spe_find_cpus() function is introduced to identify how many CPUs
support ARM SPE. Based on the found CPU number, calculates the metadata
size.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 34 +++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 3e7f62bac2e0..f8126283dad2 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -38,11 +38,39 @@ struct arm_spe_recording {
 	bool			*wrapped;
 };
 
+static struct perf_cpu_map *arm_spe_find_cpus(struct evlist *evlist)
+{
+	struct perf_cpu_map *event_cpus = evlist->core.user_requested_cpus;
+	struct perf_cpu_map *online_cpus = perf_cpu_map__new_online_cpus();
+	struct perf_cpu_map *intersect_cpus;
+
+	/* cpu map is not "any" CPU , we have specific CPUs to work with */
+	if (!perf_cpu_map__has_any_cpu(event_cpus)) {
+		intersect_cpus = perf_cpu_map__intersect(event_cpus, online_cpus);
+		perf_cpu_map__put(online_cpus);
+	/* Event can be "any" CPU so count all CPUs. */
+	} else {
+		intersect_cpus = online_cpus;
+	}
+
+	return intersect_cpus;
+}
+
 static size_t
 arm_spe_info_priv_size(struct auxtrace_record *itr __maybe_unused,
-		       struct evlist *evlist __maybe_unused)
+		       struct evlist *evlist)
 {
-	return ARM_SPE_AUXTRACE_V1_PRIV_SIZE;
+	struct perf_cpu_map *cpu_map = arm_spe_find_cpus(evlist);
+	size_t size;
+
+	if (!cpu_map)
+		return 0;
+
+	size = ARM_SPE_AUXTRACE_V2_PRIV_SIZE +
+	       ARM_SPE_AUXTRACE_V2_PER_CPU_SIZE * perf_cpu_map__nr(cpu_map);
+
+	perf_cpu_map__put(cpu_map);
+	return size;
 }
 
 static int arm_spe_info_fill(struct auxtrace_record *itr,
@@ -54,7 +82,7 @@ static int arm_spe_info_fill(struct auxtrace_record *itr,
 			container_of(itr, struct arm_spe_recording, itr);
 	struct perf_pmu *arm_spe_pmu = sper->pmu[0];
 
-	if (priv_size != ARM_SPE_AUXTRACE_V1_PRIV_SIZE)
+	if (priv_size != arm_spe_info_priv_size(itr, session->evlist))
 		return -EINVAL;
 
 	if (!session->evlist->core.nr_mmaps)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 7/9] perf arm-spe: Save per CPU information in metadata
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
                   ` (5 preceding siblings ...)
  2024-08-27 16:44 ` [PATCH v1 6/9] perf arm-spe: Calculate meta data size Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 8/9] perf arm-spe: Support metadata version 2 Leo Yan
  2024-08-27 16:44 ` [PATCH v1 9/9] perf arm-spe: Dump metadata with " Leo Yan
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

An Arm SPE event can include all CPUs or just a subset of CPUs in a
system.

Instead of storing the Arm SPE event parameters in a single metadata
entry, save the Arm SPE information on a per-CPU basis. This approach
aligns with how Arm SPE traces are recorded per CPU, making it easier
to match metadata with trace data.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 73 ++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index f8126283dad2..d0cff3ac7c1f 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -73,14 +73,74 @@ arm_spe_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 	return size;
 }
 
+static int arm_spe_save_cpu_header(struct auxtrace_record *itr,
+				   struct perf_cpu cpu, __u64 data[])
+{
+	struct arm_spe_recording *sper =
+			container_of(itr, struct arm_spe_recording, itr);
+	struct perf_pmu *pmu = NULL;
+	struct perf_pmu tmp_pmu;
+	char cpu_id_str[16];
+	char *cpuid = NULL;
+	u64 val;
+	int i;
+
+	snprintf(cpu_id_str, sizeof(cpu_id_str), "%d", cpu.cpu);
+	tmp_pmu.cpus = perf_cpu_map__new(cpu_id_str);
+	if (!tmp_pmu.cpus)
+		return -ENOMEM;
+
+	/* Read CPU MIDR */
+	cpuid = perf_pmu__getcpuid(&tmp_pmu);
+	if (!cpuid)
+		return -ENOMEM;
+	val = strtol(cpuid, NULL, 16);
+	perf_cpu_map__put(tmp_pmu.cpus);
+
+	data[ARM_SPE_CPU] = cpu.cpu;
+	data[ARM_SPE_CPU_MIDR] = val;
+
+	/* Find the associate Arm SPE PMU for the CPU */
+	for (i = 0; i < sper->nr_pmu; i++) {
+		if (perf_cpu_map__has(sper->pmu[i]->cpus, cpu)) {
+			pmu = sper->pmu[i];
+			break;
+		}
+	}
+
+	if (!pmu) {
+		/* No Arm SPE PMU is found */
+		data[ARM_SPE_CPU_PMU_TYPE] = ULLONG_MAX;
+		data[ARM_SPE_CAP_MIN_IVAL] = 0;
+		data[ARM_SPE_CAP_LDS] = 0;
+	} else {
+		data[ARM_SPE_CPU_PMU_TYPE] = pmu->type;
+
+		if (perf_pmu__scan_file(pmu, "caps/min_interval", "%lu", &val) != 1)
+			val = 0;
+		data[ARM_SPE_CAP_MIN_IVAL] = val;
+
+		if (perf_pmu__scan_file(pmu, "caps/lds", "%lu", &val) != 1)
+			val = 0;
+		data[ARM_SPE_CAP_LDS] = val;
+	}
+
+	return ARM_SPE_PER_CPU_PRIV_MAX;
+}
+
 static int arm_spe_info_fill(struct auxtrace_record *itr,
 			     struct perf_session *session,
 			     struct perf_record_auxtrace_info *auxtrace_info,
 			     size_t priv_size)
 {
+	int i, ret;
+	size_t offset;
 	struct arm_spe_recording *sper =
 			container_of(itr, struct arm_spe_recording, itr);
 	struct perf_pmu *arm_spe_pmu = sper->pmu[0];
+	struct perf_cpu_map *cpu_map = arm_spe_find_cpus(session->evlist);
+	struct perf_cpu cpu;
+	__u64 *data;
 
 	if (priv_size != arm_spe_info_priv_size(itr, session->evlist))
 		return -EINVAL;
@@ -90,7 +150,20 @@ static int arm_spe_info_fill(struct auxtrace_record *itr,
 
 	auxtrace_info->type = PERF_AUXTRACE_ARM_SPE;
 	auxtrace_info->priv[ARM_SPE_PMU_TYPE] = arm_spe_pmu->type;
+	auxtrace_info->priv[ARM_SPE_HEADER_VERSION] = ARM_SPE_HEADER_CURRENT_VERSION;
+	auxtrace_info->priv[ARM_SPE_CPU_NUM] = perf_cpu_map__nr(cpu_map);
+
+	offset = ARM_SPE_CPU_NUM + 1;
+	perf_cpu_map__for_each_cpu(cpu, i, cpu_map) {
+		assert(offset < priv_size);
+		data = &auxtrace_info->priv[offset];
+		ret = arm_spe_save_cpu_header(itr, cpu, data);
+		if (ret < 0)
+			return ret;
+		offset += ret;
+	}
 
+	perf_cpu_map__put(cpu_map);
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 8/9] perf arm-spe: Support metadata version 2
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
                   ` (6 preceding siblings ...)
  2024-08-27 16:44 ` [PATCH v1 7/9] perf arm-spe: Save per CPU information in metadata Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-27 16:44 ` [PATCH v1 9/9] perf arm-spe: Dump metadata with " Leo Yan
  8 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

This commit is to support metadata version 2 and at the meantime it is
backward compatible for version 1's format.

The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field.
By checking the header size, it distinguishes the metadata is version 1
or version 2 (and any new versions if later will have). For version 2,
it reads out CPU number and retrieves the metadata info for every CPU.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 76 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 74 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 1e87342f4bd8..87cf06db765b 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -78,6 +78,10 @@ struct arm_spe {
 
 	unsigned long			num_events;
 	u8				use_ctx_pkt_for_pid;
+
+	u64				**metadata;
+	u64				metadata_ver;
+	u64				metadata_num_cpu;
 };
 
 struct arm_spe_queue {
@@ -1044,12 +1048,16 @@ static void arm_spe_free_events(struct perf_session *session)
 
 static void arm_spe_free(struct perf_session *session)
 {
+	unsigned int i;
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
 
 	auxtrace_heap__free(&spe->heap);
 	arm_spe_free_events(session);
 	session->auxtrace = NULL;
+	for (i = 0; i < spe->metadata_num_cpu; i++)
+		zfree(&spe->metadata[i]);
+	zfree(&spe->metadata);
 	free(spe);
 }
 
@@ -1256,24 +1264,81 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	return 0;
 }
 
+static bool
+arm_spe_has_metadata_field(struct perf_record_auxtrace_info *auxtrace_info, int pos)
+{
+	return auxtrace_info->header.size >=
+		sizeof(struct perf_record_auxtrace_info) + (sizeof(u64) * (pos + 1));
+}
+
+static u64 *arm_spe__create_meta_blk(u64 *buf, int per_cpu_size)
+{
+	u64 *metadata = NULL;
+
+	metadata = zalloc(sizeof(*metadata) * per_cpu_size);
+	if (!metadata)
+		return NULL;
+
+	memcpy(metadata, buf, per_cpu_size);
+	return metadata;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
 	struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
-	size_t min_sz = ARM_SPE_AUXTRACE_V1_PRIV_SIZE;
+	size_t min_sz;
 	struct perf_record_time_conv *tc = &session->time_conv;
 	const char *cpuid = perf_env__cpuid(session->evlist->env);
 	u64 midr = strtol(cpuid, NULL, 16);
 	struct arm_spe *spe;
+	u64 *ptr = NULL;
+	u64 **metadata = NULL;
+	u64 metadata_ver;
+	int num_cpu = 0, i, per_cpu_sz;
 	int err;
 
+	/* First the global part */
+	ptr = (u64 *)auxtrace_info->priv;
+
+	/* Metadata v1 has no the ARM_SPE_HEADER_VERSION field */
+	if (!arm_spe_has_metadata_field(auxtrace_info, ARM_SPE_HEADER_VERSION)) {
+		metadata_ver = 1;
+		num_cpu = 0;
+		min_sz = ARM_SPE_AUXTRACE_V1_PRIV_SIZE;
+		per_cpu_sz = 0;
+		ptr += ARM_SPE_AUXTRACE_V1_PRIV_MAX;
+	} else {
+		metadata_ver = ptr[ARM_SPE_HEADER_VERSION];
+		num_cpu = ptr[ARM_SPE_CPU_NUM];
+		min_sz = ARM_SPE_AUXTRACE_V2_PRIV_SIZE +
+			 ARM_SPE_AUXTRACE_V2_PER_CPU_SIZE * num_cpu;
+		per_cpu_sz = ARM_SPE_AUXTRACE_V2_PER_CPU_SIZE;
+		ptr += ARM_SPE_AUXTRACE_V2_PRIV_MAX;
+	}
+
 	if (auxtrace_info->header.size < sizeof(struct perf_record_auxtrace_info) +
 					min_sz)
 		return -EINVAL;
 
+	if (num_cpu) {
+		metadata = zalloc(sizeof(*metadata) * num_cpu);
+		if (!metadata)
+			return -ENOMEM;
+
+		for (i = 0; i < num_cpu; i++) {
+			metadata[i] = arm_spe__create_meta_blk(ptr, per_cpu_sz);
+			if (!metadata[i]) {
+				err = -ENOMEM;
+				goto err_free_metadata;
+			}
+			ptr += per_cpu_sz / sizeof(u64);
+		}
+	}
+
 	spe = zalloc(sizeof(struct arm_spe));
 	if (!spe)
-		return -ENOMEM;
+		goto err_free_metadata;
 
 	err = auxtrace_queues__init(&spe->queues);
 	if (err)
@@ -1283,6 +1348,9 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->machine = &session->machines.host; /* No kvm support */
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->midr = midr;
+	spe->metadata = metadata;
+	spe->metadata_ver = metadata_ver;
+	spe->metadata_num_cpu = num_cpu;
 
 	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 
@@ -1343,5 +1411,9 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	session->auxtrace = NULL;
 err_free:
 	free(spe);
+err_free_metadata:
+	for (i = 0; i < num_cpu; i++)
+		zfree(&metadata[i]);
+	zfree(&metadata);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 9/9] perf arm-spe: Dump metadata with version 2
  2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
                   ` (7 preceding siblings ...)
  2024-08-27 16:44 ` [PATCH v1 8/9] perf arm-spe: Support metadata version 2 Leo Yan
@ 2024-08-27 16:44 ` Leo Yan
  2024-08-28 16:20   ` James Clark
  8 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, James Clark, John Garry,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Liang, Kan,
	Jonathan Cameron, Yicong Yang, linux-arm-kernel, linux-kernel,
	coresight, linux-perf-users
  Cc: Leo Yan

This commit dumps metadata with version 2. It uses two string arrays
metadata_hdr_fmts and metadata_per_cpu_fmts as string formats for the
header and per CPU data respectively, and the arm_spe_print_info()
function is enhanced to support dumping metadata with the version 2
format.

After:

  0 0 0x4a8 [0x170]: PERF_RECORD_AUXTRACE_INFO type: 4
  PMU Type           :13
  Version            :2
  Num of CPUs        :8
    CPU #            :0
    MIDR             :0x410fd801
    Bound PMU Type   :-1
    Min Interval     :0
    Load Data Source :0
    CPU #            :1
    MIDR             :0x410fd801
    Bound PMU Type   :-1
    Min Interval     :0
    Load Data Source :0
    CPU #            :2
    MIDR             :0x410fd870
    Bound PMU Type   :13
    Min Interval     :1024
    Load Data Source :1
    CPU #            :3
    MIDR             :0x410fd870
    Bound PMU Type   :13
    Min Interval     :1024
    Load Data Source :1
    CPU #            :4
    MIDR             :0x410fd870
    Bound PMU Type   :13
    Min Interval     :1024
    Load Data Source :1
    CPU #            :5
    MIDR             :0x410fd870
    Bound PMU Type   :13
    Min Interval     :1024
    Load Data Source :1
    CPU #            :6
    MIDR             :0x410fd850
    Bound PMU Type   :14
    Min Interval     :1024
    Load Data Source :1
    CPU #            :7
    MIDR             :0x410fd850
    Bound PMU Type   :14
    Min Interval     :1024
    Load Data Source :1

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/arm-spe.c | 43 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 87cf06db765b..be34d4c4306a 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1067,16 +1067,49 @@ static bool arm_spe_evsel_is_auxtrace(struct perf_session *session __maybe_unuse
 	return strstarts(evsel->name, ARM_SPE_PMU_NAME);
 }
 
-static const char * const arm_spe_info_fmts[] = {
-	[ARM_SPE_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+static const char * const metadata_hdr_fmts[] = {
+	[ARM_SPE_PMU_TYPE]		= "  PMU Type           :%"PRId64"\n",
+	[ARM_SPE_HEADER_VERSION]	= "  Version            :%"PRId64"\n",
+	[ARM_SPE_CPU_NUM]		= "  Num of CPUs        :%"PRId64"\n",
 };
 
-static void arm_spe_print_info(__u64 *arr)
+static const char * const metadata_per_cpu_fmts[] = {
+	[ARM_SPE_CPU]			= "    CPU #            :%"PRId64"\n",
+	[ARM_SPE_CPU_MIDR]		= "    MIDR             :0x%"PRIx64"\n",
+	[ARM_SPE_CPU_PMU_TYPE]		= "    Bound PMU Type   :%"PRId64"\n",
+	[ARM_SPE_CAP_MIN_IVAL]		= "    Min Interval     :%"PRId64"\n",
+	[ARM_SPE_CAP_LDS]		= "    Load Data Source :%"PRId64"\n",
+};
+
+static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
 {
+	unsigned int i, cpu, header_size, cpu_num, per_cpu_size;
+
 	if (!dump_trace)
 		return;
 
-	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
+	if (spe->metadata_ver == 1) {
+		cpu_num = 0;
+		header_size = ARM_SPE_AUXTRACE_V1_PRIV_MAX;
+		per_cpu_size = 0;
+	} else if (spe->metadata_ver == 2) {
+		cpu_num = arr[ARM_SPE_CPU_NUM];
+		header_size = ARM_SPE_AUXTRACE_V2_PRIV_MAX;
+		per_cpu_size = ARM_SPE_AUXTRACE_V2_PRIV_PER_CPU_MAX;
+	} else {
+		pr_err("Cannot support metadata ver: %ld\n", spe->metadata_ver);
+		return;
+	}
+
+	for (i = 0; i < header_size; i++)
+		fprintf(stdout, metadata_hdr_fmts[i], arr[i]);
+
+	arr += header_size;
+	for (cpu = 0; cpu < cpu_num; cpu++) {
+		for (i = 0; i < per_cpu_size; i++)
+			fprintf(stdout, metadata_per_cpu_fmts[i], arr[i]);
+		arr += per_cpu_size;
+	}
 }
 
 static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
@@ -1383,7 +1416,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace.evsel_is_auxtrace = arm_spe_evsel_is_auxtrace;
 	session->auxtrace = &spe->auxtrace;
 
-	arm_spe_print_info(&auxtrace_info->priv[0]);
+	arm_spe_print_info(spe, &auxtrace_info->priv[0]);
 
 	if (dump_trace)
 		return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 9/9] perf arm-spe: Dump metadata with version 2
  2024-08-27 16:44 ` [PATCH v1 9/9] perf arm-spe: Dump metadata with " Leo Yan
@ 2024-08-28 16:20   ` James Clark
  2024-08-30  7:54     ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2024-08-28 16:20 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, John Garry, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Liang, Kan, Jonathan Cameron,
	Yicong Yang, linux-arm-kernel, linux-kernel, coresight,
	linux-perf-users



On 27/08/2024 5:44 pm, Leo Yan wrote:
> This commit dumps metadata with version 2. It uses two string arrays
> metadata_hdr_fmts and metadata_per_cpu_fmts as string formats for the
> header and per CPU data respectively, and the arm_spe_print_info()
> function is enhanced to support dumping metadata with the version 2
> format.
> 
> After:
> 
>    0 0 0x4a8 [0x170]: PERF_RECORD_AUXTRACE_INFO type: 4
>    PMU Type           :13
>    Version            :2
>    Num of CPUs        :8
>      CPU #            :0
>      MIDR             :0x410fd801
>      Bound PMU Type   :-1
>      Min Interval     :0
>      Load Data Source :0
>      CPU #            :1
>      MIDR             :0x410fd801
>      Bound PMU Type   :-1
>      Min Interval     :0
>      Load Data Source :0
>      CPU #            :2
>      MIDR             :0x410fd870
>      Bound PMU Type   :13
>      Min Interval     :1024
>      Load Data Source :1
>      CPU #            :3
>      MIDR             :0x410fd870
>      Bound PMU Type   :13
>      Min Interval     :1024
>      Load Data Source :1
>      CPU #            :4
>      MIDR             :0x410fd870
>      Bound PMU Type   :13
>      Min Interval     :1024
>      Load Data Source :1
>      CPU #            :5
>      MIDR             :0x410fd870
>      Bound PMU Type   :13
>      Min Interval     :1024
>      Load Data Source :1
>      CPU #            :6
>      MIDR             :0x410fd850
>      Bound PMU Type   :14
>      Min Interval     :1024
>      Load Data Source :1
>      CPU #            :7
>      MIDR             :0x410fd850
>      Bound PMU Type   :14
>      Min Interval     :1024
>      Load Data Source :1
> 
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
>   tools/perf/util/arm-spe.c | 43 ++++++++++++++++++++++++++++++++++-----
>   1 file changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 87cf06db765b..be34d4c4306a 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -1067,16 +1067,49 @@ static bool arm_spe_evsel_is_auxtrace(struct perf_session *session __maybe_unuse
>   	return strstarts(evsel->name, ARM_SPE_PMU_NAME);
>   }
>   
> -static const char * const arm_spe_info_fmts[] = {
> -	[ARM_SPE_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
> +static const char * const metadata_hdr_fmts[] = {
> +	[ARM_SPE_PMU_TYPE]		= "  PMU Type           :%"PRId64"\n",
> +	[ARM_SPE_HEADER_VERSION]	= "  Version            :%"PRId64"\n",
> +	[ARM_SPE_CPU_NUM]		= "  Num of CPUs        :%"PRId64"\n",
>   };
>   
> -static void arm_spe_print_info(__u64 *arr)
> +static const char * const metadata_per_cpu_fmts[] = {
> +	[ARM_SPE_CPU]			= "    CPU #            :%"PRId64"\n",
> +	[ARM_SPE_CPU_MIDR]		= "    MIDR             :0x%"PRIx64"\n",
> +	[ARM_SPE_CPU_PMU_TYPE]		= "    Bound PMU Type   :%"PRId64"\n",
> +	[ARM_SPE_CAP_MIN_IVAL]		= "    Min Interval     :%"PRId64"\n",
> +	[ARM_SPE_CAP_LDS]		= "    Load Data Source :%"PRId64"\n",
> +};
> +
> +static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
>   {
> +	unsigned int i, cpu, header_size, cpu_num, per_cpu_size;
> +
>   	if (!dump_trace)
>   		return;
>   
> -	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
> +	if (spe->metadata_ver == 1) {
> +		cpu_num = 0;
> +		header_size = ARM_SPE_AUXTRACE_V1_PRIV_MAX;
> +		per_cpu_size = 0;
> +	} else if (spe->metadata_ver == 2) {

Assuming future version updates are backwards compatible and only add 
new info this should be spe->metadata_ver >= 2, otherwise version bumps 
end up causing errors when files get passed around.

I know there are arguments about what should and shouldn't be supported 
when opening new files on old perfs, but in this case it's easy to only 
add new info to the aux header and leave the old stuff intact.

> +		cpu_num = arr[ARM_SPE_CPU_NUM];
> +		header_size = ARM_SPE_AUXTRACE_V2_PRIV_MAX;
> +		per_cpu_size = ARM_SPE_AUXTRACE_V2_PRIV_PER_CPU_MAX;

I think for coresight we also save the size of each per-cpu block rather 
than use a constant, that way new items can be appended without breaking 
readers.

That kind of leads to another point that this mechanism is mostly 
duplicated from coresight. It saves a main header version, then per-cpu 
groups of variable size with named elements. I'm not saying we should 
definitely try to share the code, but it's worth keeping in mind.

> +	} else {
> +		pr_err("Cannot support metadata ver: %ld\n", spe->metadata_ver);
> +		return;
> +	}
> +
> +	for (i = 0; i < header_size; i++)
> +		fprintf(stdout, metadata_hdr_fmts[i], arr[i]);
> +
> +	arr += header_size;
> +	for (cpu = 0; cpu < cpu_num; cpu++) {
> +		for (i = 0; i < per_cpu_size; i++)
> +			fprintf(stdout, metadata_per_cpu_fmts[i], arr[i]);
> +		arr += per_cpu_size;
> +	}
>   }
>   
>   static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
> @@ -1383,7 +1416,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>   	spe->auxtrace.evsel_is_auxtrace = arm_spe_evsel_is_auxtrace;
>   	session->auxtrace = &spe->auxtrace;
>   
> -	arm_spe_print_info(&auxtrace_info->priv[0]);
> +	arm_spe_print_info(spe, &auxtrace_info->priv[0]);
>   
>   	if (dump_trace)
>   		return 0;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 9/9] perf arm-spe: Dump metadata with version 2
  2024-08-28 16:20   ` James Clark
@ 2024-08-30  7:54     ` Leo Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-08-30  7:54 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Suzuki K Poulose, Mike Leach, John Garry, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Liang, Kan, Jonathan Cameron,
	Yicong Yang, linux-arm-kernel, linux-kernel, coresight,
	linux-perf-users

On 8/28/24 17:20, James Clark wrote:
>> +static void arm_spe_print_info(struct arm_spe *spe, __u64 *arr)
>>   {
>> +     unsigned int i, cpu, header_size, cpu_num, per_cpu_size;
>> +
>>       if (!dump_trace)
>>               return;
>>
>> -     fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
>> +     if (spe->metadata_ver == 1) {
>> +             cpu_num = 0;
>> +             header_size = ARM_SPE_AUXTRACE_V1_PRIV_MAX;
>> +             per_cpu_size = 0;
>> +     } else if (spe->metadata_ver == 2) {
> 
> Assuming future version updates are backwards compatible and only add
> new info this should be spe->metadata_ver >= 2, otherwise version bumps
> end up causing errors when files get passed around.
> 
> I know there are arguments about what should and shouldn't be supported
> when opening new files on old perfs, but in this case it's easy to only
> add new info to the aux header and leave the old stuff intact.
> 
>> +             cpu_num = arr[ARM_SPE_CPU_NUM];
>> +             header_size = ARM_SPE_AUXTRACE_V2_PRIV_MAX;
>> +             per_cpu_size = ARM_SPE_AUXTRACE_V2_PRIV_PER_CPU_MAX;
> 
> I think for coresight we also save the size of each per-cpu block rather
> than use a constant, that way new items can be appended without breaking
> readers.

Good point for adding a 'size' field for each per-cpu block.

My understanding is we need to make the metadata format to be self-described.
E.g. the metadata header contains the size for itself, and every per CPU
metadata block also contains a 'size' field.  Based on this, a general code
can be used to processing different metadata versions.
  
> That kind of leads to another point that this mechanism is mostly
> duplicated from coresight. It saves a main header version, then per-cpu
> groups of variable size with named elements. I'm not saying we should
> definitely try to share the code, but it's worth keeping in mind.

Agreed. I will refine a bit, for better matching this direction.

Thanks for suggestion.

Leo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-08-27 16:44 ` [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity Leo Yan
@ 2024-08-30 10:38   ` Will Deacon
  2024-08-30 12:12     ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2024-08-30 10:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Mike Leach, James Clark, John Garry, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Liang, Kan, Jonathan Cameron, Yicong Yang,
	linux-arm-kernel, linux-kernel, coresight, linux-perf-users

On Tue, Aug 27, 2024 at 05:44:09PM +0100, Leo Yan wrote:
> This commit adds a new entry 'lds' in the capacity folder. 'lds' stands
> for "loaded data source". When its value is 1, it indicates the data
> source implemented, and data source packets will be recorded in the
> trace data.
> 
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
>  drivers/perf/arm_spe_pmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 9100d82bfabc..81c1e7627721 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action {
>  /* This sysfs gunk was really good fun to write. */
>  enum arm_spe_pmu_capabilities {
>  	SPE_PMU_CAP_ARCH_INST = 0,
> +	SPE_PMU_CAP_LDS,
>  	SPE_PMU_CAP_ERND,
>  	SPE_PMU_CAP_FEAT_MAX,
>  	SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
> @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities {
>  
>  static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
>  	[SPE_PMU_CAP_ARCH_INST]	= SPE_PMU_FEAT_ARCH_INST,
> +	[SPE_PMU_CAP_LDS]	= SPE_PMU_FEAT_LDS,
>  	[SPE_PMU_CAP_ERND]	= SPE_PMU_FEAT_ERND,
>  };
>  
> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
>  
>  static struct attribute *arm_spe_pmu_cap_attr[] = {
>  	SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
> +	SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS),
>  	SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
>  	SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
>  	SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),

What will userspace do with this? I don't think you can turn LDS on/off,
so either you'll get the data source packet or you won't.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-08-30 10:38   ` Will Deacon
@ 2024-08-30 12:12     ` Leo Yan
  2024-08-30 13:09       ` Will Deacon
  0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2024-08-30 12:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Mike Leach, James Clark, John Garry, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Liang, Kan, Jonathan Cameron, Yicong Yang,
	linux-arm-kernel, linux-kernel, coresight, linux-perf-users

On 8/30/24 11:38, Will Deacon wrote:
> On Tue, Aug 27, 2024 at 05:44:09PM +0100, Leo Yan wrote:
>> This commit adds a new entry 'lds' in the capacity folder. 'lds' stands
>> for "loaded data source". When its value is 1, it indicates the data
>> source implemented, and data source packets will be recorded in the
>> trace data.
>>
>> Signed-off-by: Leo Yan <leo.yan@arm.com>
>> ---
>>   drivers/perf/arm_spe_pmu.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
>> index 9100d82bfabc..81c1e7627721 100644
>> --- a/drivers/perf/arm_spe_pmu.c
>> +++ b/drivers/perf/arm_spe_pmu.c
>> @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action {
>>   /* This sysfs gunk was really good fun to write. */
>>   enum arm_spe_pmu_capabilities {
>>        SPE_PMU_CAP_ARCH_INST = 0,
>> +     SPE_PMU_CAP_LDS,
>>        SPE_PMU_CAP_ERND,
>>        SPE_PMU_CAP_FEAT_MAX,
>>        SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
>> @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities {
>>
>>   static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
>>        [SPE_PMU_CAP_ARCH_INST] = SPE_PMU_FEAT_ARCH_INST,
>> +     [SPE_PMU_CAP_LDS]       = SPE_PMU_FEAT_LDS,
>>        [SPE_PMU_CAP_ERND]      = SPE_PMU_FEAT_ERND,
>>   };
>>
>> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
>>
>>   static struct attribute *arm_spe_pmu_cap_attr[] = {
>>        SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
>> +     SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS),
>>        SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
>>        SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
>>        SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
> 
> What will userspace do with this? I don't think you can turn LDS on/off,
> so either you'll get the data source packet or you won't.

Yes, LDS bit does not work as a switch.

The tool in the userspace will record the LDS bit into the metadata. During
decoding phase, it reads out the LDS from metadata. Based on it, the perf
tool can know if the data source is supported or not, if yes then decode the
data source packet.

Another point is how to decide the data source packet format. Now we maintain
a CPU list for tracking CPU variants which support data source trace. For long
term, I would like the tool can based on hardware feature (e.g. a ID register
in Arm SPE) to decide the data source format, so far it is absent. This is why
LDS bit + CPU list is a more reliable way. See some discussion [1].

Thanks,
Leo

[1] https://lore.kernel.org/linux-perf-users/Zl9jLtiFagBcH7oH@J2N7QTR9R3/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-08-30 12:12     ` Leo Yan
@ 2024-08-30 13:09       ` Will Deacon
  2024-08-31 11:37         ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2024-08-30 13:09 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Mike Leach, James Clark, John Garry, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Liang, Kan, Jonathan Cameron, Yicong Yang,
	linux-arm-kernel, linux-kernel, coresight, linux-perf-users

On Fri, Aug 30, 2024 at 01:12:33PM +0100, Leo Yan wrote:
> On 8/30/24 11:38, Will Deacon wrote:
> > On Tue, Aug 27, 2024 at 05:44:09PM +0100, Leo Yan wrote:
> > > This commit adds a new entry 'lds' in the capacity folder. 'lds' stands
> > > for "loaded data source". When its value is 1, it indicates the data
> > > source implemented, and data source packets will be recorded in the
> > > trace data.
> > > 
> > > Signed-off-by: Leo Yan <leo.yan@arm.com>
> > > ---
> > >   drivers/perf/arm_spe_pmu.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> > > index 9100d82bfabc..81c1e7627721 100644
> > > --- a/drivers/perf/arm_spe_pmu.c
> > > +++ b/drivers/perf/arm_spe_pmu.c
> > > @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action {
> > >   /* This sysfs gunk was really good fun to write. */
> > >   enum arm_spe_pmu_capabilities {
> > >        SPE_PMU_CAP_ARCH_INST = 0,
> > > +     SPE_PMU_CAP_LDS,
> > >        SPE_PMU_CAP_ERND,
> > >        SPE_PMU_CAP_FEAT_MAX,
> > >        SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX,
> > > @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities {
> > > 
> > >   static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = {
> > >        [SPE_PMU_CAP_ARCH_INST] = SPE_PMU_FEAT_ARCH_INST,
> > > +     [SPE_PMU_CAP_LDS]       = SPE_PMU_FEAT_LDS,
> > >        [SPE_PMU_CAP_ERND]      = SPE_PMU_FEAT_ERND,
> > >   };
> > > 
> > > @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
> > > 
> > >   static struct attribute *arm_spe_pmu_cap_attr[] = {
> > >        SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
> > > +     SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS),
> > >        SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
> > >        SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
> > >        SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
> > 
> > What will userspace do with this? I don't think you can turn LDS on/off,
> > so either you'll get the data source packet or you won't.
> 
> Yes, LDS bit does not work as a switch.
> 
> The tool in the userspace will record the LDS bit into the metadata. During
> decoding phase, it reads out the LDS from metadata. Based on it, the perf
> tool can know if the data source is supported or not, if yes then decode the
> data source packet.

Why not just decode a data source packet when you see it? i.e. assume LDS
is always set.

> Another point is how to decide the data source packet format. Now we maintain
> a CPU list for tracking CPU variants which support data source trace. For long
> term, I would like the tool can based on hardware feature (e.g. a ID register
> in Arm SPE) to decide the data source format, so far it is absent. This is why
> LDS bit + CPU list is a more reliable way. See some discussion [1].

Huh. Why would you have a CPU in the list if it _doesn't_ have LDS? If
we have to resort to per-CPU decoding, then that's even more of a reason
not to have the LDS cap imo.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-08-30 13:09       ` Will Deacon
@ 2024-08-31 11:37         ` Leo Yan
  2024-09-04 12:35           ` Will Deacon
  0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2024-08-31 11:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Mike Leach, James Clark, John Garry, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Liang, Kan, Jonathan Cameron, Yicong Yang,
	linux-arm-kernel, linux-kernel, coresight, linux-perf-users

On 8/30/2024 2:09 PM, Will Deacon wrote:

[...]

>>>> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
>>>>
>>>>   static struct attribute *arm_spe_pmu_cap_attr[] = {
>>>>        SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
>>>> +     SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS),
>>>>        SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
>>>>        SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
>>>>        SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
>>>
>>> What will userspace do with this? I don't think you can turn LDS on/off,
>>> so either you'll get the data source packet or you won't.
>>
>> Yes, LDS bit does not work as a switch.
>>
>> The tool in the userspace will record the LDS bit into the metadata. During
>> decoding phase, it reads out the LDS from metadata. Based on it, the perf
>> tool can know if the data source is supported or not, if yes then decode the
>> data source packet.
> 
> Why not just decode a data source packet when you see it? i.e. assume LDS
> is always set.

The current tool works this way to directly decode a data source packet.

However, as Arm ARM section D17.2.4 "Data Source packet" describes, the loaded
data source is implementation dependent, the data source payload format also
is implementation defined.

We are halfway here in using the LDS bit to determine if the data source is
implemented. However, we lack information on the data source format
implementation. As a first step, we can use the LDS bit for sanity checking in
the tool to detect any potential silicon implementation issues. Once we have
an architectural definition for the data source format, we can extend the tool
accordingly.

>> Another point is how to decide the data source packet format. Now we maintain
>> a CPU list for tracking CPU variants which support data source trace. For long
>> term, I would like the tool can based on hardware feature (e.g. a ID register
>> in Arm SPE) to decide the data source format, so far it is absent. This is why
>> LDS bit + CPU list is a more reliable way. See some discussion [1].
> 
> Huh. Why would you have a CPU in the list if it _doesn't_ have LDS?

Yeah, this is what we don't expect - we can verify the implementation based on
LDS bit.

E.g. if users ask data source related questions, we can use LDS bit (saved in
the perf metadata) to confirm the feature has been implemented in a silicon.

> If we have to resort to per-CPU decoding, then that's even more of a reason>
not to have the LDS cap imo.

This series converts the Arm SPE information into per-CPU metadata, including
the LDS bit. Consequently, the decoding process retrieves CPU metadata for
per-CPU decoding, making it easy to determine if a CPU supports the data source.

We have platforms that not all CPUs support Arm SPE, for example, the CPU0 and
CPU1 don't support Arm SPE, CPU2~CPU5 share a Arm SPE PMU event, CPU6~CPU7
share another Arm SPE PMU event. In this case, per CPU metadata can be easily
for checking hardware capacity (include LDS bit) in the decoding.

Thanks,
Leo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-08-31 11:37         ` Leo Yan
@ 2024-09-04 12:35           ` Will Deacon
  2024-09-04 14:02             ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2024-09-04 12:35 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Mike Leach, James Clark, John Garry, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Liang, Kan, Jonathan Cameron, Yicong Yang,
	linux-arm-kernel, linux-kernel, coresight, linux-perf-users

On Sat, Aug 31, 2024 at 12:37:29PM +0100, Leo Yan wrote:
> On 8/30/2024 2:09 PM, Will Deacon wrote:
> 
> [...]
> 
> >>>> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev,
> >>>>
> >>>>   static struct attribute *arm_spe_pmu_cap_attr[] = {
> >>>>        SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST),
> >>>> +     SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS),
> >>>>        SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND),
> >>>>        SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ),
> >>>>        SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
> >>>
> >>> What will userspace do with this? I don't think you can turn LDS on/off,
> >>> so either you'll get the data source packet or you won't.
> >>
> >> Yes, LDS bit does not work as a switch.
> >>
> >> The tool in the userspace will record the LDS bit into the metadata. During
> >> decoding phase, it reads out the LDS from metadata. Based on it, the perf
> >> tool can know if the data source is supported or not, if yes then decode the
> >> data source packet.
> > 
> > Why not just decode a data source packet when you see it? i.e. assume LDS
> > is always set.
> 
> The current tool works this way to directly decode a data source packet.
> 
> However, as Arm ARM section D17.2.4 "Data Source packet" describes, the loaded
> data source is implementation dependent, the data source payload format also
> is implementation defined.
> 
> We are halfway here in using the LDS bit to determine if the data source is
> implemented. However, we lack information on the data source format
> implementation. As a first step, we can use the LDS bit for sanity checking in
> the tool to detect any potential silicon implementation issues. Once we have
> an architectural definition for the data source format, we can extend the tool
> accordingly.

I don't think we shyould expose UAPI from the driver to detect potential
hardware bugs. Let's add it when we know it's useful for something instead.

> 
> >> Another point is how to decide the data source packet format. Now we maintain
> >> a CPU list for tracking CPU variants which support data source trace. For long
> >> term, I would like the tool can based on hardware feature (e.g. a ID register
> >> in Arm SPE) to decide the data source format, so far it is absent. This is why
> >> LDS bit + CPU list is a more reliable way. See some discussion [1].
> > 
> > Huh. Why would you have a CPU in the list if it _doesn't_ have LDS?
> 
> Yeah, this is what we don't expect - we can verify the implementation based on
> LDS bit.
> 
> E.g. if users ask data source related questions, we can use LDS bit (saved in
> the perf metadata) to confirm the feature has been implemented in a silicon.

What exactly do you mean by this?

As far as I can tell:

  - Data source packets are either present or absent depending on LDS
  - You need CPU-specific information to decode them it they are present

So it's neither necessary nor sufficient to expose the LDS bit to
userspace.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity
  2024-09-04 12:35           ` Will Deacon
@ 2024-09-04 14:02             ` Leo Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2024-09-04 14:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Mike Leach, James Clark, John Garry, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Liang, Kan, Jonathan Cameron, Yicong Yang,
	linux-arm-kernel, linux-kernel, coresight, linux-perf-users

On 9/4/2024 1:35 PM, Will Deacon wrote:

>>> Why not just decode a data source packet when you see it? i.e. assume LDS
>>> is always set.
>>
>> The current tool works this way to directly decode a data source packet.
>>
>> However, as Arm ARM section D17.2.4 "Data Source packet" describes, the loaded
>> data source is implementation dependent, the data source payload format also
>> is implementation defined.
>>
>> We are halfway here in using the LDS bit to determine if the data source is
>> implemented. However, we lack information on the data source format
>> implementation. As a first step, we can use the LDS bit for sanity checking in
>> the tool to detect any potential silicon implementation issues. Once we have
>> an architectural definition for the data source format, we can extend the tool
>> accordingly.
> 
> I don't think we shyould expose UAPI from the driver to detect potential
> hardware bugs. Let's add it when we know it's useful for something instead.

I understand your concern.

>>>> Another point is how to decide the data source packet format. Now we maintain
>>>> a CPU list for tracking CPU variants which support data source trace. For long
>>>> term, I would like the tool can based on hardware feature (e.g. a ID register
>>>> in Arm SPE) to decide the data source format, so far it is absent. This is why
>>>> LDS bit + CPU list is a more reliable way. See some discussion [1].
>>>
>>> Huh. Why would you have a CPU in the list if it _doesn't_ have LDS?
>>
>> Yeah, this is what we don't expect - we can verify the implementation based on
>> LDS bit.
>>
>> E.g. if users ask data source related questions, we can use LDS bit (saved in
>> the perf metadata) to confirm the feature has been implemented in a silicon.
> 
> What exactly do you mean by this?

Sometimes, users might ask if the data source is supported in Arm SPE. With exposing
the LDS bit, it would be helpful for us to check if the feature is supported. For
example, when a user uses the perf tool to record Arm SPE data, the LDS bit is stored
in the perf metadata, and we can check its value during post-analysis.
 
> As far as I can tell:
> 
>   - Data source packets are either present or absent depending on LDS
>   - You need CPU-specific information to decode them it they are present
> 
> So it's neither necessary nor sufficient to expose the LDS bit to
> userspace.

We can live without exposing LDS bit currently. I will drop this change in next spin.

Just head up, later I think we might still need to expose capacity bits (or the
PMSIDR_EL1 register) for new features. As you said, we can do it when it is necessary.

Thanks for suggestion!

Leo 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-09-04 14:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27 16:44 [PATCH v1 0/9] perf arm-spe: Introduce metadata version 2 Leo Yan
2024-08-27 16:44 ` [PATCH v1 1/9] perf: arm_spe: Introduce 'lds' capacity Leo Yan
2024-08-30 10:38   ` Will Deacon
2024-08-30 12:12     ` Leo Yan
2024-08-30 13:09       ` Will Deacon
2024-08-31 11:37         ` Leo Yan
2024-09-04 12:35           ` Will Deacon
2024-09-04 14:02             ` Leo Yan
2024-08-27 16:44 ` [PATCH v1 2/9] perf auxtrace arm: Refactor error handling Leo Yan
2024-08-27 16:44 ` [PATCH v1 3/9] perf auxtrace arm: Introduce find_auxtrace_pmus_by_name() Leo Yan
2024-08-27 16:44 ` [PATCH v1 4/9] perf: arm-spe: Record multiple PMUs Leo Yan
2024-08-27 16:44 ` [PATCH v1 5/9] perf arm-spe: Extend meta data header for version 2 Leo Yan
2024-08-27 16:44 ` [PATCH v1 6/9] perf arm-spe: Calculate meta data size Leo Yan
2024-08-27 16:44 ` [PATCH v1 7/9] perf arm-spe: Save per CPU information in metadata Leo Yan
2024-08-27 16:44 ` [PATCH v1 8/9] perf arm-spe: Support metadata version 2 Leo Yan
2024-08-27 16:44 ` [PATCH v1 9/9] perf arm-spe: Dump metadata with " Leo Yan
2024-08-28 16:20   ` James Clark
2024-08-30  7:54     ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).