linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
@ 2024-08-13 13:23 James Clark
  2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
	Ze Gao, linux-kernel, linux-arm-kernel

The important patches are 3 and 5, the rest are tidyups and tests.

I don't think there is any interaction with the other open issues
about the uncore DSU cycles event or JSON/legacy hw event priorities
because only hw events on core PMUs are used for the default
stat command. And also just sharing the existing x86 code works so
no big changes are required.

For patch 3 the weak arch specific symbol has to continue to be used
rather than picking the implementation based on 
perf_pmus__supports_extended_type() like in patch 5. This is because
that function ends up calling evsel__hw_name() itself which results
in recursion. But at least one weak arch_* construct has been removed,
so it's better than nothing.


James Clark (7):
  perf stat: Initialize instead of overwriting clock event
  perf stat: Remove unused default_null_attrs
  perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
  perf evsel: Remove duplicated __evsel__hw_name() code
  perf evlist: Use hybrid default attrs whenever extended type is
    supported
  perf test: Make stat test work on DT devices
  perf test: Add a test for default perf stat command

 tools/perf/arch/arm64/util/Build   |  1 +
 tools/perf/arch/arm64/util/evsel.c |  7 ++++
 tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
 tools/perf/arch/x86/util/evsel.c   | 17 +-------
 tools/perf/builtin-stat.c          | 12 ++----
 tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
 tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
 tools/perf/util/evlist.h           |  6 +--
 tools/perf/util/evsel.c            | 19 +++++++++
 tools/perf/util/evsel.h            |  2 +-
 10 files changed, 119 insertions(+), 108 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/evsel.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 14:28   ` Ian Rogers
  2024-08-13 13:23 ` [PATCH 2/7] perf stat: Remove unused default_null_attrs James Clark
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel

This overwrite relies on the clock event remaining at index 0 and is
quite a way down from where the array is initialized, making it easy to
miss. Just initialize it with the correct clock event to begin with.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/builtin-stat.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1f92445f7480..a65f58f8783f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
 {
 	struct perf_event_attr default_attrs0[] = {
 
-  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
+  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
+						PERF_COUNT_SW_CPU_CLOCK :
+						PERF_COUNT_SW_TASK_CLOCK	},
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES	},
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS		},
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS		},
@@ -2030,9 +2032,6 @@ static int add_default_attributes(void)
 
 	if (!evsel_list->core.nr_entries) {
 		/* No events so add defaults. */
-		if (target__has_cpu(&target))
-			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
-
 		if (evlist__add_default_attrs(evsel_list, default_attrs0) < 0)
 			return -1;
 		if (perf_pmus__have_event("cpu", "stalled-cycles-frontend")) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] perf stat: Remove unused default_null_attrs
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
  2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 13:23 ` [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86 James Clark
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel

All current implementations of arch_evlist__add_default_attrs() do
nothing if nr_attrs is 0. To avoid confusion that it's used, remove it.

It appears that it used to be used as a mechanism to add topdown events,
but this is now done separately.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/builtin-stat.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a65f58f8783f..cf6a1923811b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1930,7 +1930,6 @@ static int add_default_attributes(void)
 	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16)				},
 };
 
-	struct perf_event_attr default_null_attrs[] = {};
 	const char *pmu = parse_events_option_args.pmu_filter ?: "all";
 
 	/* Set attrs if no event is selected and !null_run: */
@@ -2072,10 +2071,6 @@ static int add_default_attributes(void)
 			evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
 			evlist__delete(metric_evlist);
 		}
-
-		/* Platform specific attrs */
-		if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
-			return -1;
 	}
 
 	/* Detailed events get appended to the event list: */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
  2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
  2024-08-13 13:23 ` [PATCH 2/7] perf stat: Remove unused default_null_attrs James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 13:23 ` [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code James Clark
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
	Ze Gao, linux-kernel, linux-arm-kernel

Both of these platforms share the same extended type ID mechanism, so
share the implementation of the evsel name lookup.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/arch/arm64/util/Build   |  1 +
 tools/perf/arch/arm64/util/evsel.c |  7 +++++++
 tools/perf/arch/x86/util/evsel.c   | 17 +----------------
 tools/perf/util/evsel.c            | 24 ++++++++++++++++++++++++
 tools/perf/util/evsel.h            |  2 +-
 5 files changed, 34 insertions(+), 17 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/evsel.c

diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 343ef7589a77..45c77f747d54 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -1,3 +1,4 @@
+perf-util-y += evsel.o
 perf-util-y += header.o
 perf-util-y += machine.o
 perf-util-y += perf_regs.o
diff --git a/tools/perf/arch/arm64/util/evsel.c b/tools/perf/arch/arm64/util/evsel.c
new file mode 100644
index 000000000000..6974f72ee37f
--- /dev/null
+++ b/tools/perf/arch/arm64/util/evsel.c
@@ -0,0 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "util/evsel.h"
+
+int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
+{
+	return evsel__hw_name_ext_type_id(evsel, bf, size);
+}
diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index 090d0f371891..771b35fef61a 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -49,22 +49,7 @@ bool arch_evsel__must_be_in_group(const struct evsel *evsel)
 
 int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
 {
-	u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
-	u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
-	const char *event_name;
-
-	if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
-		event_name = evsel__hw_names[event];
-	else
-		event_name = "unknown-hardware";
-
-	/* The PMU type is not required for the non-hybrid platform. */
-	if (!pmu)
-		return  scnprintf(bf, size, "%s", event_name);
-
-	return scnprintf(bf, size, "%s/%s/",
-			 evsel->pmu_name ? evsel->pmu_name : "cpu",
-			 event_name);
+	return evsel__hw_name_ext_type_id(evsel, bf, size);
 }
 
 static void ibs_l3miss_warn(void)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc603193c477..19f4d0e71733 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -584,6 +584,30 @@ int __weak arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
 	return scnprintf(bf, size, "%s", __evsel__hw_name(evsel->core.attr.config));
 }
 
+/*
+ * Version of evsel__hw_name() used on platforms where perf_pmus__supports_extended_type()
+ * may be true and the type needs to be extracted and masked.
+ */
+int evsel__hw_name_ext_type_id(struct evsel *evsel, char *bf, size_t size)
+{
+	u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
+	u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
+	const char *event_name;
+
+	if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
+		event_name = evsel__hw_names[event];
+	else
+		event_name = "unknown-hardware";
+
+	/* The PMU type is not required for the non-hybrid platform. */
+	if (!pmu)
+		return  scnprintf(bf, size, "%s", event_name);
+
+	return scnprintf(bf, size, "%s/%s/",
+			 evsel->pmu_name ? evsel->pmu_name : "cpu",
+			 event_name);
+}
+
 static int evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
 {
 	int r = arch_evsel__hw_name(evsel, bf, size);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 80b5f6dd868e..f95fc2919462 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -299,7 +299,7 @@ extern const char *const evsel__sw_names[PERF_COUNT_SW_MAX];
 extern char *evsel__bpf_counter_events;
 bool evsel__match_bpf_counter_events(const char *name);
 int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size);
-
+int evsel__hw_name_ext_type_id(struct evsel *evsel, char *bf, size_t size);
 int __evsel__hw_cache_type_op_res_name(u8 type, u8 op, u8 result, char *bf, size_t size);
 const char *evsel__name(struct evsel *evsel);
 bool evsel__name_is(struct evsel *evsel, const char *name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
                   ` (2 preceding siblings ...)
  2024-08-13 13:23 ` [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86 James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel

__evsel__hw_name() does the same thing so use it instead.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/util/evsel.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 19f4d0e71733..ba960a39f104 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -592,12 +592,7 @@ int evsel__hw_name_ext_type_id(struct evsel *evsel, char *bf, size_t size)
 {
 	u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
 	u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
-	const char *event_name;
-
-	if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
-		event_name = evsel__hw_names[event];
-	else
-		event_name = "unknown-hardware";
+	const char *event_name = __evsel__hw_name(event);
 
 	/* The PMU type is not required for the non-hybrid platform. */
 	if (!pmu)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
                   ` (3 preceding siblings ...)
  2024-08-13 13:23 ` [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 13:33   ` James Clark
  2024-08-13 13:23 ` [PATCH 6/7] perf test: Make stat test work on DT devices James Clark
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
	Ze Gao, linux-kernel, linux-arm-kernel

For x86, a hybrid version of ___evlist__add_default_attrs() was added to
support default perf stat events. This can actually be used whenever
perf_pmus__supports_extended_type() is true, which now makes default
perf stat arguments work properly on Arm big.LITTLE:

  $ perf stat
  ...
        3347093940    armv8_cortex_a53/cycles/    #  0.563 GHz  (98.99%)
        3295523067    armv8_cortex_a57/cycles/    #  0.554 GHz  (67.07%)
  ...

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/arch/x86/util/evlist.c | 65 -------------------------------
 tools/perf/util/evlist.c          | 65 +++++++++++++++++++++++++++----
 tools/perf/util/evlist.h          |  6 +--
 3 files changed, 59 insertions(+), 77 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index b1ce0c52d88d..bbe6240c7f71 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -8,71 +8,6 @@
 #include "topdown.h"
 #include "evsel.h"
 
-static int ___evlist__add_default_attrs(struct evlist *evlist,
-					struct perf_event_attr *attrs,
-					size_t nr_attrs)
-{
-	LIST_HEAD(head);
-	size_t i = 0;
-
-	for (i = 0; i < nr_attrs; i++)
-		event_attr_init(attrs + i);
-
-	if (perf_pmus__num_core_pmus() == 1)
-		return evlist__add_attrs(evlist, attrs, nr_attrs);
-
-	for (i = 0; i < nr_attrs; i++) {
-		struct perf_pmu *pmu = NULL;
-
-		if (attrs[i].type == PERF_TYPE_SOFTWARE) {
-			struct evsel *evsel = evsel__new(attrs + i);
-
-			if (evsel == NULL)
-				goto out_delete_partial_list;
-			list_add_tail(&evsel->core.node, &head);
-			continue;
-		}
-
-		while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
-			struct perf_cpu_map *cpus;
-			struct evsel *evsel;
-
-			evsel = evsel__new(attrs + i);
-			if (evsel == NULL)
-				goto out_delete_partial_list;
-			evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
-			cpus = perf_cpu_map__get(pmu->cpus);
-			evsel->core.cpus = cpus;
-			evsel->core.own_cpus = perf_cpu_map__get(cpus);
-			evsel->pmu_name = strdup(pmu->name);
-			list_add_tail(&evsel->core.node, &head);
-		}
-	}
-
-	evlist__splice_list_tail(evlist, &head);
-
-	return 0;
-
-out_delete_partial_list:
-	{
-		struct evsel *evsel, *n;
-
-		__evlist__for_each_entry_safe(&head, n, evsel)
-			evsel__delete(evsel);
-	}
-	return -1;
-}
-
-int arch_evlist__add_default_attrs(struct evlist *evlist,
-				   struct perf_event_attr *attrs,
-				   size_t nr_attrs)
-{
-	if (!nr_attrs)
-		return 0;
-
-	return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
-}
-
 int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
 {
 	if (topdown_sys_has_perf_metrics() &&
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1417f9a23083..e0c31399beb6 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -5,6 +5,7 @@
  * Parts came from builtin-{top,stat,record}.c, see those files for further
  * copyright notes.
  */
+#include "pmus.h"
 #include <api/fs/fs.h>
 #include <errno.h>
 #include <inttypes.h>
@@ -338,24 +339,74 @@ int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size
 	return -1;
 }
 
-int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
+
+static int __evlist__add_default_attrs_ext_type(struct evlist *evlist,
+						struct perf_event_attr *attrs,
+						size_t nr_attrs)
 {
-	size_t i;
+	LIST_HEAD(head);
+	size_t i = 0;
 
 	for (i = 0; i < nr_attrs; i++)
 		event_attr_init(attrs + i);
 
-	return evlist__add_attrs(evlist, attrs, nr_attrs);
+	for (i = 0; i < nr_attrs; i++) {
+		struct perf_pmu *pmu = NULL;
+
+		if (attrs[i].type == PERF_TYPE_SOFTWARE) {
+			struct evsel *evsel = evsel__new(attrs + i);
+
+			if (evsel == NULL)
+				goto out_delete_partial_list;
+			list_add_tail(&evsel->core.node, &head);
+			continue;
+		}
+
+		while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
+			struct perf_cpu_map *cpus;
+			struct evsel *evsel;
+
+			evsel = evsel__new(attrs + i);
+			if (evsel == NULL)
+				goto out_delete_partial_list;
+			evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+			cpus = perf_cpu_map__get(pmu->cpus);
+			evsel->core.cpus = cpus;
+			evsel->core.own_cpus = perf_cpu_map__get(cpus);
+			evsel->pmu_name = strdup(pmu->name);
+			list_add_tail(&evsel->core.node, &head);
+		}
+	}
+
+	evlist__splice_list_tail(evlist, &head);
+
+	return 0;
+
+out_delete_partial_list:
+	{
+		struct evsel *evsel, *n;
+
+		__evlist__for_each_entry_safe(&head, n, evsel)
+			evsel__delete(evsel);
+	}
+	return -1;
 }
 
-__weak int arch_evlist__add_default_attrs(struct evlist *evlist,
-					  struct perf_event_attr *attrs,
-					  size_t nr_attrs)
+int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs,
+				size_t nr_attrs)
 {
+	size_t i;
+
 	if (!nr_attrs)
 		return 0;
 
-	return __evlist__add_default_attrs(evlist, attrs, nr_attrs);
+	if (perf_pmus__supports_extended_type())
+		return __evlist__add_default_attrs_ext_type(evlist, attrs, nr_attrs);
+
+	for (i = 0; i < nr_attrs; i++)
+		event_attr_init(attrs + i);
+
+	return evlist__add_attrs(evlist, attrs, nr_attrs);
 }
 
 struct evsel *evlist__find_tracepoint_by_id(struct evlist *evlist, int id)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cccc34da5a02..06d1eeacc0d1 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -106,12 +106,8 @@ int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size
 int __evlist__add_default_attrs(struct evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
 
-int arch_evlist__add_default_attrs(struct evlist *evlist,
-				   struct perf_event_attr *attrs,
-				   size_t nr_attrs);
-
 #define evlist__add_default_attrs(evlist, array) \
-	arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
+	__evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
 
 int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] perf test: Make stat test work on DT devices
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
                   ` (4 preceding siblings ...)
  2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 13:23 ` [PATCH 7/7] perf test: Add a test for default perf stat command James Clark
  2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
  7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Dominique Martinet,
	Ze Gao, linux-kernel, linux-arm-kernel

PMUs aren't listed in /sys/devices/ on DT devices, so change the search
directory to /sys/bus/event_source/devices which works everywhere. Also
add armv8_cortex_* as a known PMU type to search for to make the test
run on more devices.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/stat.sh | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 3f1e67795490..525d0c44fdc6 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -117,16 +117,18 @@ test_cputype() {
 
   # Find a known PMU for cputype.
   pmu=""
-  for i in cpu cpu_atom armv8_pmuv3_0
+  devs="/sys/bus/event_source/devices"
+  for i in $devs/cpu $devs/cpu_atom $devs/armv8_pmuv3_0 $devs/armv8_cortex_*
   do
-    if test -d "/sys/devices/$i"
+    i_base=$(basename "$i")
+    if test -d "$i"
     then
-      pmu="$i"
+      pmu="$i_base"
       break
     fi
-    if perf stat -e "$i/instructions/" true > /dev/null 2>&1
+    if perf stat -e "$i_base/instructions/" true > /dev/null 2>&1
     then
-      pmu="$i"
+      pmu="$i_base"
       break
     fi
   done
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] perf test: Add a test for default perf stat command
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
                   ` (5 preceding siblings ...)
  2024-08-13 13:23 ` [PATCH 6/7] perf test: Make stat test work on DT devices James Clark
@ 2024-08-13 13:23 ` James Clark
  2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
  7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Dominique Martinet,
	Ze Gao, linux-kernel, linux-arm-kernel

Test that one cycles event is opened for each core PMU when "perf stat"
is run without arguments.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/stat.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 525d0c44fdc6..7118cfc37f48 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -148,6 +148,26 @@ test_cputype() {
   echo "cputype test [Success]"
 }
 
+test_hybrid() {
+  # Test the default stat command on hybrid devices opens one cycles event for
+  # each CPU type.
+  echo "hybrid test"
+
+  # Count the number of core PMUs
+  pmus=$(ls /sys/bus/event_source/devices/*/cpus 2>/dev/null | wc -l)
+
+  # Run default Perf stat
+  cycles_events=$(perf stat -- true 2>&1 | grep "cycles" | wc -l)
+
+  if [ "$pmus" -ne "$cycles_events" ]
+  then
+    echo "hybrid test [Found $pmus PMUs but $cycles_events cycles events. Failed]"
+    err=1
+    return
+  fi
+  echo "hybrid test [Success]"
+}
+
 test_default_stat
 test_stat_record_report
 test_stat_record_script
@@ -155,4 +175,5 @@ test_stat_repeat_weak_groups
 test_topdown_groups
 test_topdown_weak_groups
 test_cputype
+test_hybrid
 exit $err
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported
  2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
@ 2024-08-13 13:33   ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:33 UTC (permalink / raw)
  To: irogers, linux-perf-users
  Cc: John Garry, Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
	Dominique Martinet, Yang Jihong, Ze Gao, linux-kernel,
	linux-arm-kernel



On 13/08/2024 2:23 pm, James Clark wrote:
> For x86, a hybrid version of ___evlist__add_default_attrs() was added to
> support default perf stat events. This can actually be used whenever
> perf_pmus__supports_extended_type() is true, which now makes default
> perf stat arguments work properly on Arm big.LITTLE:
> 
>    $ perf stat
>    ...
>          3347093940    armv8_cortex_a53/cycles/    #  0.563 GHz  (98.99%)
>          3295523067    armv8_cortex_a57/cycles/    #  0.554 GHz  (67.07%)
>    ...
> 
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>   tools/perf/arch/x86/util/evlist.c | 65 -------------------------------
>   tools/perf/util/evlist.c          | 65 +++++++++++++++++++++++++++----
>   tools/perf/util/evlist.h          |  6 +--
>   3 files changed, 59 insertions(+), 77 deletions(-)
> 

[...]

> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1417f9a23083..e0c31399beb6 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -5,6 +5,7 @@
>    * Parts came from builtin-{top,stat,record}.c, see those files for further
>    * copyright notes.
>    */
> +#include "pmus.h"

Oops, this include was auto added and isn't required.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
  2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
@ 2024-08-13 14:28   ` Ian Rogers
  2024-08-13 14:38     ` James Clark
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2024-08-13 14:28 UTC (permalink / raw)
  To: James Clark
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel

On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>
> This overwrite relies on the clock event remaining at index 0 and is
> quite a way down from where the array is initialized, making it easy to
> miss. Just initialize it with the correct clock event to begin with.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  tools/perf/builtin-stat.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 1f92445f7480..a65f58f8783f 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>  {
>         struct perf_event_attr default_attrs0[] = {
>
> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
> +                                               PERF_COUNT_SW_CPU_CLOCK :
> +                                               PERF_COUNT_SW_TASK_CLOCK        },

Hand crafting perf_event_attr when we have an event name to
perf_event_atttr parser doesn't make sense. Doing things this way
means we need to duplicate logic between event parsing and these
default configurations. The default configurations are also using
legacy events which of course are broken on Apple ARM M? (albeit for
hardware events, here it is software). Event and metric parsing has to
worry about things like grouping topdown events. All-in-all let's have
one way to do things, event parsing, otherwise this code is going to
end up reinventing all the workarounds the event parsing has to have.
Lots of struct perf_event_attr also contribute to binary size.

If you are worried about a cycles event being opened on arm_dsu PMUs,
there is this patch:
https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/

Thanks,
Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
  2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
                   ` (6 preceding siblings ...)
  2024-08-13 13:23 ` [PATCH 7/7] perf test: Add a test for default perf stat command James Clark
@ 2024-08-13 14:35 ` Ian Rogers
  2024-08-13 14:45   ` James Clark
  7 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2024-08-13 14:35 UTC (permalink / raw)
  To: James Clark
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
	Ze Gao, linux-kernel, linux-arm-kernel

On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>
> The important patches are 3 and 5, the rest are tidyups and tests.
>
> I don't think there is any interaction with the other open issues
> about the uncore DSU cycles event or JSON/legacy hw event priorities
> because only hw events on core PMUs are used for the default
> stat command. And also just sharing the existing x86 code works so
> no big changes are required.
>
> For patch 3 the weak arch specific symbol has to continue to be used
> rather than picking the implementation based on
> perf_pmus__supports_extended_type() like in patch 5. This is because
> that function ends up calling evsel__hw_name() itself which results
> in recursion. But at least one weak arch_* construct has been removed,
> so it's better than nothing.

Let's not do things this way. The use of strings is architecture
neutral, means we don't need to create new arch functions on things
like RISC-V, it encapsulates the complexity of things like topdown
events, Apple ARM M CPUs not supporting legacy events, etc.
Duplicating the existing x86 logic, when that was something trying to
be removed, is not the way to go. That logic was a holdover from the
hybrid tech debt we've been working to remove with a generic approach.

Thanks,
Ian

> James Clark (7):
>   perf stat: Initialize instead of overwriting clock event
>   perf stat: Remove unused default_null_attrs
>   perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>   perf evsel: Remove duplicated __evsel__hw_name() code
>   perf evlist: Use hybrid default attrs whenever extended type is
>     supported
>   perf test: Make stat test work on DT devices
>   perf test: Add a test for default perf stat command
>
>  tools/perf/arch/arm64/util/Build   |  1 +
>  tools/perf/arch/arm64/util/evsel.c |  7 ++++
>  tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
>  tools/perf/arch/x86/util/evsel.c   | 17 +-------
>  tools/perf/builtin-stat.c          | 12 ++----
>  tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
>  tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
>  tools/perf/util/evlist.h           |  6 +--
>  tools/perf/util/evsel.c            | 19 +++++++++
>  tools/perf/util/evsel.h            |  2 +-
>  10 files changed, 119 insertions(+), 108 deletions(-)
>  create mode 100644 tools/perf/arch/arm64/util/evsel.c
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
  2024-08-13 14:28   ` Ian Rogers
@ 2024-08-13 14:38     ` James Clark
  2024-08-13 14:43       ` Ian Rogers
  0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 14:38 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel



On 13/08/2024 3:28 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>
>> This overwrite relies on the clock event remaining at index 0 and is
>> quite a way down from where the array is initialized, making it easy to
>> miss. Just initialize it with the correct clock event to begin with.
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   tools/perf/builtin-stat.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 1f92445f7480..a65f58f8783f 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>>   {
>>          struct perf_event_attr default_attrs0[] = {
>>
>> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
>> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
>> +                                               PERF_COUNT_SW_CPU_CLOCK :
>> +                                               PERF_COUNT_SW_TASK_CLOCK        },
> 
> Hand crafting perf_event_attr when we have an event name to
> perf_event_atttr parser doesn't make sense. Doing things this way
> means we need to duplicate logic between event parsing and these
> default configurations. The default configurations are also using
> legacy events which of course are broken on Apple ARM M? (albeit for
> hardware events, here it is software). Event and metric parsing has to
> worry about things like grouping topdown events. All-in-all let's have
> one way to do things, event parsing, otherwise this code is going to
> end up reinventing all the workarounds the event parsing has to have.
> Lots of struct perf_event_attr also contribute to binary size.
> 
> If you are worried about a cycles event being opened on arm_dsu PMUs,
> there is this patch:
> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> 
> Thanks,
> Ian

Hi Ian,

Is this comment related to this patch specifically or is it more of a 
general comment?

This patch doesn't really make any actual changes other than move one 
line of code from one place to another.

James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
  2024-08-13 14:38     ` James Clark
@ 2024-08-13 14:43       ` Ian Rogers
  2024-08-13 14:56         ` James Clark
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2024-08-13 14:43 UTC (permalink / raw)
  To: James Clark
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel

On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 13/08/2024 3:28 pm, Ian Rogers wrote:
> > On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> This overwrite relies on the clock event remaining at index 0 and is
> >> quite a way down from where the array is initialized, making it easy to
> >> miss. Just initialize it with the correct clock event to begin with.
> >>
> >> Signed-off-by: James Clark <james.clark@linaro.org>
> >> ---
> >>   tools/perf/builtin-stat.c | 7 +++----
> >>   1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> >> index 1f92445f7480..a65f58f8783f 100644
> >> --- a/tools/perf/builtin-stat.c
> >> +++ b/tools/perf/builtin-stat.c
> >> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
> >>   {
> >>          struct perf_event_attr default_attrs0[] = {
> >>
> >> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
> >> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
> >> +                                               PERF_COUNT_SW_CPU_CLOCK :
> >> +                                               PERF_COUNT_SW_TASK_CLOCK        },
> >
> > Hand crafting perf_event_attr when we have an event name to
> > perf_event_atttr parser doesn't make sense. Doing things this way
> > means we need to duplicate logic between event parsing and these
> > default configurations. The default configurations are also using
> > legacy events which of course are broken on Apple ARM M? (albeit for
> > hardware events, here it is software). Event and metric parsing has to
> > worry about things like grouping topdown events. All-in-all let's have
> > one way to do things, event parsing, otherwise this code is going to
> > end up reinventing all the workarounds the event parsing has to have.
> > Lots of struct perf_event_attr also contribute to binary size.
> >
> > If you are worried about a cycles event being opened on arm_dsu PMUs,
> > there is this patch:
> > https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >
> > Thanks,
> > Ian
>
> Hi Ian,
>
> Is this comment related to this patch specifically or is it more of a
> general comment?
>
> This patch doesn't really make any actual changes other than move one
> line of code from one place to another.

James, this code is removed here:
https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/

Thanks,
Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
  2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
@ 2024-08-13 14:45   ` James Clark
  2024-08-13 15:10     ` James Clark
  0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 14:45 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Dominique Martinet, Ze Gao,
	linux-kernel, linux-arm-kernel



On 13/08/2024 3:35 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>
>> The important patches are 3 and 5, the rest are tidyups and tests.
>>
>> I don't think there is any interaction with the other open issues
>> about the uncore DSU cycles event or JSON/legacy hw event priorities
>> because only hw events on core PMUs are used for the default
>> stat command. And also just sharing the existing x86 code works so
>> no big changes are required.
>>
>> For patch 3 the weak arch specific symbol has to continue to be used
>> rather than picking the implementation based on
>> perf_pmus__supports_extended_type() like in patch 5. This is because
>> that function ends up calling evsel__hw_name() itself which results
>> in recursion. But at least one weak arch_* construct has been removed,
>> so it's better than nothing.
> 
> Let's not do things this way. The use of strings is architecture
> neutral, means we don't need to create new arch functions on things
> like RISC-V, it encapsulates the complexity of things like topdown

If the new arch function is an issue that could be worked around by 
calling perf_pmus__supports_extended_type() on patch 3 as well? It just 
needs a small change to not recurse.

> events, Apple ARM M CPUs not supporting legacy events, etc.

If Apple M doesn't support the HW events does _any_ default Perf stat 
command (hybrid or not) work? I'm not really trying to fix that here, 
just make whatever already works work on big.LITTLE.

> Duplicating the existing x86 logic, when that was something trying to
> be removed, is not the way to go. That logic was a holdover from the
> hybrid tech debt we've been working to remove with a generic approach.
> 
> Thanks,
> Ian
> 

I think all of that may make sense, but in this case I haven't actually 
duplicated anything, rather shared the existing code to also be used on 
Arm.

This means we can have the default perf stat working on Arm from today, 
and if any other changes get made it will continue to work as I've also 
added a test for it.

>> James Clark (7):
>>    perf stat: Initialize instead of overwriting clock event
>>    perf stat: Remove unused default_null_attrs
>>    perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>>    perf evsel: Remove duplicated __evsel__hw_name() code
>>    perf evlist: Use hybrid default attrs whenever extended type is
>>      supported
>>    perf test: Make stat test work on DT devices
>>    perf test: Add a test for default perf stat command
>>
>>   tools/perf/arch/arm64/util/Build   |  1 +
>>   tools/perf/arch/arm64/util/evsel.c |  7 ++++
>>   tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
>>   tools/perf/arch/x86/util/evsel.c   | 17 +-------
>>   tools/perf/builtin-stat.c          | 12 ++----
>>   tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
>>   tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
>>   tools/perf/util/evlist.h           |  6 +--
>>   tools/perf/util/evsel.c            | 19 +++++++++
>>   tools/perf/util/evsel.h            |  2 +-
>>   10 files changed, 119 insertions(+), 108 deletions(-)
>>   create mode 100644 tools/perf/arch/arm64/util/evsel.c
>>
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
  2024-08-13 14:43       ` Ian Rogers
@ 2024-08-13 14:56         ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 14:56 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
	Dominique Martinet, linux-kernel, linux-arm-kernel



On 13/08/2024 3:43 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 13/08/2024 3:28 pm, Ian Rogers wrote:
>>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> This overwrite relies on the clock event remaining at index 0 and is
>>>> quite a way down from where the array is initialized, making it easy to
>>>> miss. Just initialize it with the correct clock event to begin with.
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>>    tools/perf/builtin-stat.c | 7 +++----
>>>>    1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>> index 1f92445f7480..a65f58f8783f 100644
>>>> --- a/tools/perf/builtin-stat.c
>>>> +++ b/tools/perf/builtin-stat.c
>>>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>>>>    {
>>>>           struct perf_event_attr default_attrs0[] = {
>>>>
>>>> -  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK             },
>>>> +  { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
>>>> +                                               PERF_COUNT_SW_CPU_CLOCK :
>>>> +                                               PERF_COUNT_SW_TASK_CLOCK        },
>>>
>>> Hand crafting perf_event_attr when we have an event name to
>>> perf_event_atttr parser doesn't make sense. Doing things this way
>>> means we need to duplicate logic between event parsing and these
>>> default configurations. The default configurations are also using
>>> legacy events which of course are broken on Apple ARM M? (albeit for
>>> hardware events, here it is software). Event and metric parsing has to
>>> worry about things like grouping topdown events. All-in-all let's have
>>> one way to do things, event parsing, otherwise this code is going to
>>> end up reinventing all the workarounds the event parsing has to have.
>>> Lots of struct perf_event_attr also contribute to binary size.
>>>
>>> If you are worried about a cycles event being opened on arm_dsu PMUs,
>>> there is this patch:
>>> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>
>>> Thanks,
>>> Ian
>>
>> Hi Ian,
>>
>> Is this comment related to this patch specifically or is it more of a
>> general comment?
>>
>> This patch doesn't really make any actual changes other than move one
>> line of code from one place to another.
> 
> James, this code is removed here:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> 
> Thanks,
> Ian

Oh I see yeah. We can still work on that change, merging this one 
doesn't necessarily have to block that one, it just makes this one a bit 
redundant when the other one gets done.

If I remember correctly in one of the last related discussions we 
thought that opening the cycles event as a sampling event should be a 
softer warning?

Actually it seems like the DSU cycles event is a non-issue specifically 
for perf stat because it will open successfully anyway? It was just in 
perf record where it was the issue?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
  2024-08-13 14:45   ` James Clark
@ 2024-08-13 15:10     ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 15:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Liang, Kan, Dominique Martinet, Ze Gao,
	linux-kernel, linux-arm-kernel



On 13/08/2024 3:45 pm, James Clark wrote:
> 
> 
> On 13/08/2024 3:35 pm, Ian Rogers wrote:
>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> 
>> wrote:
>>>
>>> The important patches are 3 and 5, the rest are tidyups and tests.
>>>
>>> I don't think there is any interaction with the other open issues
>>> about the uncore DSU cycles event or JSON/legacy hw event priorities
>>> because only hw events on core PMUs are used for the default
>>> stat command. And also just sharing the existing x86 code works so
>>> no big changes are required.
>>>
>>> For patch 3 the weak arch specific symbol has to continue to be used
>>> rather than picking the implementation based on
>>> perf_pmus__supports_extended_type() like in patch 5. This is because
>>> that function ends up calling evsel__hw_name() itself which results
>>> in recursion. But at least one weak arch_* construct has been removed,
>>> so it's better than nothing.
>>
>> Let's not do things this way. The use of strings is architecture
>> neutral, means we don't need to create new arch functions on things
>> like RISC-V, it encapsulates the complexity of things like topdown
> 
> If the new arch function is an issue that could be worked around by 
> calling perf_pmus__supports_extended_type() on patch 3 as well? It just 
> needs a small change to not recurse.
> 
>> events, Apple ARM M CPUs not supporting legacy events, etc.
> 
> If Apple M doesn't support the HW events does _any_ default Perf stat 
> command (hybrid or not) work? I'm not really trying to fix that here, 
> just make whatever already works work on big.LITTLE.
> 
>> Duplicating the existing x86 logic, when that was something trying to
>> be removed, is not the way to go. That logic was a holdover from the
>> hybrid tech debt we've been working to remove with a generic approach.
>>
>> Thanks,
>> Ian
>>
> 
> I think all of that may make sense, but in this case I haven't actually 
> duplicated anything, rather shared the existing code to also be used on 
> Arm.
> 
> This means we can have the default perf stat working on Arm from today, 
> and if any other changes get made it will continue to work as I've also 
> added a test for it.
> 

I would also like to note that (not including the new test) this 
patchset _removes_ more code than it adds, so to say it duplicates code 
is a bit unfair.

Of course this touches some similar areas to your other change, but that 
doesn't mean I think we shouldn't continue with that one. I'm still 
happy to review and test or contribute to that one if you like. I think 
it just helped me to do it in this order and get this thing in a working 
state first before the next bigger step.

>>> James Clark (7):
>>>    perf stat: Initialize instead of overwriting clock event
>>>    perf stat: Remove unused default_null_attrs
>>>    perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>>>    perf evsel: Remove duplicated __evsel__hw_name() code
>>>    perf evlist: Use hybrid default attrs whenever extended type is
>>>      supported
>>>    perf test: Make stat test work on DT devices
>>>    perf test: Add a test for default perf stat command
>>>
>>>   tools/perf/arch/arm64/util/Build   |  1 +
>>>   tools/perf/arch/arm64/util/evsel.c |  7 ++++
>>>   tools/perf/arch/x86/util/evlist.c  | 65 ------------------------------
>>>   tools/perf/arch/x86/util/evsel.c   | 17 +-------
>>>   tools/perf/builtin-stat.c          | 12 ++----
>>>   tools/perf/tests/shell/stat.sh     | 33 ++++++++++++---
>>>   tools/perf/util/evlist.c           | 65 ++++++++++++++++++++++++++----
>>>   tools/perf/util/evlist.h           |  6 +--
>>>   tools/perf/util/evsel.c            | 19 +++++++++
>>>   tools/perf/util/evsel.h            |  2 +-
>>>   10 files changed, 119 insertions(+), 108 deletions(-)
>>>   create mode 100644 tools/perf/arch/arm64/util/evsel.c
>>>
>>> -- 
>>> 2.34.1
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-08-13 15:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
2024-08-13 14:28   ` Ian Rogers
2024-08-13 14:38     ` James Clark
2024-08-13 14:43       ` Ian Rogers
2024-08-13 14:56         ` James Clark
2024-08-13 13:23 ` [PATCH 2/7] perf stat: Remove unused default_null_attrs James Clark
2024-08-13 13:23 ` [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86 James Clark
2024-08-13 13:23 ` [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code James Clark
2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
2024-08-13 13:33   ` James Clark
2024-08-13 13:23 ` [PATCH 6/7] perf test: Make stat test work on DT devices James Clark
2024-08-13 13:23 ` [PATCH 7/7] perf test: Add a test for default perf stat command James Clark
2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
2024-08-13 14:45   ` James Clark
2024-08-13 15:10     ` James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).