* [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
@ 2024-08-13 13:23 James Clark
2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
Ze Gao, linux-kernel, linux-arm-kernel
The important patches are 3 and 5, the rest are tidyups and tests.
I don't think there is any interaction with the other open issues
about the uncore DSU cycles event or JSON/legacy hw event priorities
because only hw events on core PMUs are used for the default
stat command. And also just sharing the existing x86 code works so
no big changes are required.
For patch 3 the weak arch specific symbol has to continue to be used
rather than picking the implementation based on
perf_pmus__supports_extended_type() like in patch 5. This is because
that function ends up calling evsel__hw_name() itself which results
in recursion. But at least one weak arch_* construct has been removed,
so it's better than nothing.
James Clark (7):
perf stat: Initialize instead of overwriting clock event
perf stat: Remove unused default_null_attrs
perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
perf evsel: Remove duplicated __evsel__hw_name() code
perf evlist: Use hybrid default attrs whenever extended type is
supported
perf test: Make stat test work on DT devices
perf test: Add a test for default perf stat command
tools/perf/arch/arm64/util/Build | 1 +
tools/perf/arch/arm64/util/evsel.c | 7 ++++
tools/perf/arch/x86/util/evlist.c | 65 ------------------------------
tools/perf/arch/x86/util/evsel.c | 17 +-------
tools/perf/builtin-stat.c | 12 ++----
tools/perf/tests/shell/stat.sh | 33 ++++++++++++---
tools/perf/util/evlist.c | 65 ++++++++++++++++++++++++++----
tools/perf/util/evlist.h | 6 +--
tools/perf/util/evsel.c | 19 +++++++++
tools/perf/util/evsel.h | 2 +-
10 files changed, 119 insertions(+), 108 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/evsel.c
--
2.34.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 14:28 ` Ian Rogers
2024-08-13 13:23 ` [PATCH 2/7] perf stat: Remove unused default_null_attrs James Clark
` (6 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
This overwrite relies on the clock event remaining at index 0 and is
quite a way down from where the array is initialized, making it easy to
miss. Just initialize it with the correct clock event to begin with.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/builtin-stat.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1f92445f7480..a65f58f8783f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
{
struct perf_event_attr default_attrs0[] = {
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
+ { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
+ PERF_COUNT_SW_CPU_CLOCK :
+ PERF_COUNT_SW_TASK_CLOCK },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
@@ -2030,9 +2032,6 @@ static int add_default_attributes(void)
if (!evsel_list->core.nr_entries) {
/* No events so add defaults. */
- if (target__has_cpu(&target))
- default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
-
if (evlist__add_default_attrs(evsel_list, default_attrs0) < 0)
return -1;
if (perf_pmus__have_event("cpu", "stalled-cycles-frontend")) {
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/7] perf stat: Remove unused default_null_attrs
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 13:23 ` [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86 James Clark
` (5 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
All current implementations of arch_evlist__add_default_attrs() do
nothing if nr_attrs is 0. To avoid confusion that it's used, remove it.
It appears that it used to be used as a mechanism to add topdown events,
but this is now done separately.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/builtin-stat.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a65f58f8783f..cf6a1923811b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1930,7 +1930,6 @@ static int add_default_attributes(void)
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
};
- struct perf_event_attr default_null_attrs[] = {};
const char *pmu = parse_events_option_args.pmu_filter ?: "all";
/* Set attrs if no event is selected and !null_run: */
@@ -2072,10 +2071,6 @@ static int add_default_attributes(void)
evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
evlist__delete(metric_evlist);
}
-
- /* Platform specific attrs */
- if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
- return -1;
}
/* Detailed events get appended to the event list: */
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
2024-08-13 13:23 ` [PATCH 2/7] perf stat: Remove unused default_null_attrs James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 13:23 ` [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code James Clark
` (4 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
Ze Gao, linux-kernel, linux-arm-kernel
Both of these platforms share the same extended type ID mechanism, so
share the implementation of the evsel name lookup.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/arch/arm64/util/Build | 1 +
tools/perf/arch/arm64/util/evsel.c | 7 +++++++
tools/perf/arch/x86/util/evsel.c | 17 +----------------
tools/perf/util/evsel.c | 24 ++++++++++++++++++++++++
tools/perf/util/evsel.h | 2 +-
5 files changed, 34 insertions(+), 17 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/evsel.c
diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 343ef7589a77..45c77f747d54 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -1,3 +1,4 @@
+perf-util-y += evsel.o
perf-util-y += header.o
perf-util-y += machine.o
perf-util-y += perf_regs.o
diff --git a/tools/perf/arch/arm64/util/evsel.c b/tools/perf/arch/arm64/util/evsel.c
new file mode 100644
index 000000000000..6974f72ee37f
--- /dev/null
+++ b/tools/perf/arch/arm64/util/evsel.c
@@ -0,0 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "util/evsel.h"
+
+int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
+{
+ return evsel__hw_name_ext_type_id(evsel, bf, size);
+}
diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index 090d0f371891..771b35fef61a 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -49,22 +49,7 @@ bool arch_evsel__must_be_in_group(const struct evsel *evsel)
int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
{
- u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
- u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
- const char *event_name;
-
- if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
- event_name = evsel__hw_names[event];
- else
- event_name = "unknown-hardware";
-
- /* The PMU type is not required for the non-hybrid platform. */
- if (!pmu)
- return scnprintf(bf, size, "%s", event_name);
-
- return scnprintf(bf, size, "%s/%s/",
- evsel->pmu_name ? evsel->pmu_name : "cpu",
- event_name);
+ return evsel__hw_name_ext_type_id(evsel, bf, size);
}
static void ibs_l3miss_warn(void)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc603193c477..19f4d0e71733 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -584,6 +584,30 @@ int __weak arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
return scnprintf(bf, size, "%s", __evsel__hw_name(evsel->core.attr.config));
}
+/*
+ * Version of evsel__hw_name() used on platforms where perf_pmus__supports_extended_type()
+ * may be true and the type needs to be extracted and masked.
+ */
+int evsel__hw_name_ext_type_id(struct evsel *evsel, char *bf, size_t size)
+{
+ u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
+ u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
+ const char *event_name;
+
+ if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
+ event_name = evsel__hw_names[event];
+ else
+ event_name = "unknown-hardware";
+
+ /* The PMU type is not required for the non-hybrid platform. */
+ if (!pmu)
+ return scnprintf(bf, size, "%s", event_name);
+
+ return scnprintf(bf, size, "%s/%s/",
+ evsel->pmu_name ? evsel->pmu_name : "cpu",
+ event_name);
+}
+
static int evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
{
int r = arch_evsel__hw_name(evsel, bf, size);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 80b5f6dd868e..f95fc2919462 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -299,7 +299,7 @@ extern const char *const evsel__sw_names[PERF_COUNT_SW_MAX];
extern char *evsel__bpf_counter_events;
bool evsel__match_bpf_counter_events(const char *name);
int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size);
-
+int evsel__hw_name_ext_type_id(struct evsel *evsel, char *bf, size_t size);
int __evsel__hw_cache_type_op_res_name(u8 type, u8 op, u8 result, char *bf, size_t size);
const char *evsel__name(struct evsel *evsel);
bool evsel__name_is(struct evsel *evsel, const char *name);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
` (2 preceding siblings ...)
2024-08-13 13:23 ` [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86 James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
` (3 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
__evsel__hw_name() does the same thing so use it instead.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/util/evsel.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 19f4d0e71733..ba960a39f104 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -592,12 +592,7 @@ int evsel__hw_name_ext_type_id(struct evsel *evsel, char *bf, size_t size)
{
u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
- const char *event_name;
-
- if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
- event_name = evsel__hw_names[event];
- else
- event_name = "unknown-hardware";
+ const char *event_name = __evsel__hw_name(event);
/* The PMU type is not required for the non-hybrid platform. */
if (!pmu)
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
` (3 preceding siblings ...)
2024-08-13 13:23 ` [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 13:33 ` James Clark
2024-08-13 13:23 ` [PATCH 6/7] perf test: Make stat test work on DT devices James Clark
` (2 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
Ze Gao, linux-kernel, linux-arm-kernel
For x86, a hybrid version of ___evlist__add_default_attrs() was added to
support default perf stat events. This can actually be used whenever
perf_pmus__supports_extended_type() is true, which now makes default
perf stat arguments work properly on Arm big.LITTLE:
$ perf stat
...
3347093940 armv8_cortex_a53/cycles/ # 0.563 GHz (98.99%)
3295523067 armv8_cortex_a57/cycles/ # 0.554 GHz (67.07%)
...
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/arch/x86/util/evlist.c | 65 -------------------------------
tools/perf/util/evlist.c | 65 +++++++++++++++++++++++++++----
tools/perf/util/evlist.h | 6 +--
3 files changed, 59 insertions(+), 77 deletions(-)
diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index b1ce0c52d88d..bbe6240c7f71 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -8,71 +8,6 @@
#include "topdown.h"
#include "evsel.h"
-static int ___evlist__add_default_attrs(struct evlist *evlist,
- struct perf_event_attr *attrs,
- size_t nr_attrs)
-{
- LIST_HEAD(head);
- size_t i = 0;
-
- for (i = 0; i < nr_attrs; i++)
- event_attr_init(attrs + i);
-
- if (perf_pmus__num_core_pmus() == 1)
- return evlist__add_attrs(evlist, attrs, nr_attrs);
-
- for (i = 0; i < nr_attrs; i++) {
- struct perf_pmu *pmu = NULL;
-
- if (attrs[i].type == PERF_TYPE_SOFTWARE) {
- struct evsel *evsel = evsel__new(attrs + i);
-
- if (evsel == NULL)
- goto out_delete_partial_list;
- list_add_tail(&evsel->core.node, &head);
- continue;
- }
-
- while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
- struct perf_cpu_map *cpus;
- struct evsel *evsel;
-
- evsel = evsel__new(attrs + i);
- if (evsel == NULL)
- goto out_delete_partial_list;
- evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
- cpus = perf_cpu_map__get(pmu->cpus);
- evsel->core.cpus = cpus;
- evsel->core.own_cpus = perf_cpu_map__get(cpus);
- evsel->pmu_name = strdup(pmu->name);
- list_add_tail(&evsel->core.node, &head);
- }
- }
-
- evlist__splice_list_tail(evlist, &head);
-
- return 0;
-
-out_delete_partial_list:
- {
- struct evsel *evsel, *n;
-
- __evlist__for_each_entry_safe(&head, n, evsel)
- evsel__delete(evsel);
- }
- return -1;
-}
-
-int arch_evlist__add_default_attrs(struct evlist *evlist,
- struct perf_event_attr *attrs,
- size_t nr_attrs)
-{
- if (!nr_attrs)
- return 0;
-
- return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
-}
-
int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
{
if (topdown_sys_has_perf_metrics() &&
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1417f9a23083..e0c31399beb6 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -5,6 +5,7 @@
* Parts came from builtin-{top,stat,record}.c, see those files for further
* copyright notes.
*/
+#include "pmus.h"
#include <api/fs/fs.h>
#include <errno.h>
#include <inttypes.h>
@@ -338,24 +339,74 @@ int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size
return -1;
}
-int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
+
+static int __evlist__add_default_attrs_ext_type(struct evlist *evlist,
+ struct perf_event_attr *attrs,
+ size_t nr_attrs)
{
- size_t i;
+ LIST_HEAD(head);
+ size_t i = 0;
for (i = 0; i < nr_attrs; i++)
event_attr_init(attrs + i);
- return evlist__add_attrs(evlist, attrs, nr_attrs);
+ for (i = 0; i < nr_attrs; i++) {
+ struct perf_pmu *pmu = NULL;
+
+ if (attrs[i].type == PERF_TYPE_SOFTWARE) {
+ struct evsel *evsel = evsel__new(attrs + i);
+
+ if (evsel == NULL)
+ goto out_delete_partial_list;
+ list_add_tail(&evsel->core.node, &head);
+ continue;
+ }
+
+ while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
+ struct perf_cpu_map *cpus;
+ struct evsel *evsel;
+
+ evsel = evsel__new(attrs + i);
+ if (evsel == NULL)
+ goto out_delete_partial_list;
+ evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+ cpus = perf_cpu_map__get(pmu->cpus);
+ evsel->core.cpus = cpus;
+ evsel->core.own_cpus = perf_cpu_map__get(cpus);
+ evsel->pmu_name = strdup(pmu->name);
+ list_add_tail(&evsel->core.node, &head);
+ }
+ }
+
+ evlist__splice_list_tail(evlist, &head);
+
+ return 0;
+
+out_delete_partial_list:
+ {
+ struct evsel *evsel, *n;
+
+ __evlist__for_each_entry_safe(&head, n, evsel)
+ evsel__delete(evsel);
+ }
+ return -1;
}
-__weak int arch_evlist__add_default_attrs(struct evlist *evlist,
- struct perf_event_attr *attrs,
- size_t nr_attrs)
+int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs,
+ size_t nr_attrs)
{
+ size_t i;
+
if (!nr_attrs)
return 0;
- return __evlist__add_default_attrs(evlist, attrs, nr_attrs);
+ if (perf_pmus__supports_extended_type())
+ return __evlist__add_default_attrs_ext_type(evlist, attrs, nr_attrs);
+
+ for (i = 0; i < nr_attrs; i++)
+ event_attr_init(attrs + i);
+
+ return evlist__add_attrs(evlist, attrs, nr_attrs);
}
struct evsel *evlist__find_tracepoint_by_id(struct evlist *evlist, int id)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cccc34da5a02..06d1eeacc0d1 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -106,12 +106,8 @@ int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size
int __evlist__add_default_attrs(struct evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs);
-int arch_evlist__add_default_attrs(struct evlist *evlist,
- struct perf_event_attr *attrs,
- size_t nr_attrs);
-
#define evlist__add_default_attrs(evlist, array) \
- arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
+ __evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/7] perf test: Make stat test work on DT devices
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
` (4 preceding siblings ...)
2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 13:23 ` [PATCH 7/7] perf test: Add a test for default perf stat command James Clark
2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Dominique Martinet,
Ze Gao, linux-kernel, linux-arm-kernel
PMUs aren't listed in /sys/devices/ on DT devices, so change the search
directory to /sys/bus/event_source/devices which works everywhere. Also
add armv8_cortex_* as a known PMU type to search for to make the test
run on more devices.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/stat.sh | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 3f1e67795490..525d0c44fdc6 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -117,16 +117,18 @@ test_cputype() {
# Find a known PMU for cputype.
pmu=""
- for i in cpu cpu_atom armv8_pmuv3_0
+ devs="/sys/bus/event_source/devices"
+ for i in $devs/cpu $devs/cpu_atom $devs/armv8_pmuv3_0 $devs/armv8_cortex_*
do
- if test -d "/sys/devices/$i"
+ i_base=$(basename "$i")
+ if test -d "$i"
then
- pmu="$i"
+ pmu="$i_base"
break
fi
- if perf stat -e "$i/instructions/" true > /dev/null 2>&1
+ if perf stat -e "$i_base/instructions/" true > /dev/null 2>&1
then
- pmu="$i"
+ pmu="$i_base"
break
fi
done
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/7] perf test: Add a test for default perf stat command
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
` (5 preceding siblings ...)
2024-08-13 13:23 ` [PATCH 6/7] perf test: Make stat test work on DT devices James Clark
@ 2024-08-13 13:23 ` James Clark
2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
7 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:23 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: James Clark, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Dominique Martinet,
Ze Gao, linux-kernel, linux-arm-kernel
Test that one cycles event is opened for each core PMU when "perf stat"
is run without arguments.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/stat.sh | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 525d0c44fdc6..7118cfc37f48 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -148,6 +148,26 @@ test_cputype() {
echo "cputype test [Success]"
}
+test_hybrid() {
+ # Test the default stat command on hybrid devices opens one cycles event for
+ # each CPU type.
+ echo "hybrid test"
+
+ # Count the number of core PMUs
+ pmus=$(ls /sys/bus/event_source/devices/*/cpus 2>/dev/null | wc -l)
+
+ # Run default Perf stat
+ cycles_events=$(perf stat -- true 2>&1 | grep "cycles" | wc -l)
+
+ if [ "$pmus" -ne "$cycles_events" ]
+ then
+ echo "hybrid test [Found $pmus PMUs but $cycles_events cycles events. Failed]"
+ err=1
+ return
+ fi
+ echo "hybrid test [Success]"
+}
+
test_default_stat
test_stat_record_report
test_stat_record_script
@@ -155,4 +175,5 @@ test_stat_repeat_weak_groups
test_topdown_groups
test_topdown_weak_groups
test_cputype
+test_hybrid
exit $err
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported
2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
@ 2024-08-13 13:33 ` James Clark
0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 13:33 UTC (permalink / raw)
To: irogers, linux-perf-users
Cc: John Garry, Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang, Kan,
Dominique Martinet, Yang Jihong, Ze Gao, linux-kernel,
linux-arm-kernel
On 13/08/2024 2:23 pm, James Clark wrote:
> For x86, a hybrid version of ___evlist__add_default_attrs() was added to
> support default perf stat events. This can actually be used whenever
> perf_pmus__supports_extended_type() is true, which now makes default
> perf stat arguments work properly on Arm big.LITTLE:
>
> $ perf stat
> ...
> 3347093940 armv8_cortex_a53/cycles/ # 0.563 GHz (98.99%)
> 3295523067 armv8_cortex_a57/cycles/ # 0.554 GHz (67.07%)
> ...
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/arch/x86/util/evlist.c | 65 -------------------------------
> tools/perf/util/evlist.c | 65 +++++++++++++++++++++++++++----
> tools/perf/util/evlist.h | 6 +--
> 3 files changed, 59 insertions(+), 77 deletions(-)
>
[...]
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1417f9a23083..e0c31399beb6 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -5,6 +5,7 @@
> * Parts came from builtin-{top,stat,record}.c, see those files for further
> * copyright notes.
> */
> +#include "pmus.h"
Oops, this include was auto added and isn't required.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
@ 2024-08-13 14:28 ` Ian Rogers
2024-08-13 14:38 ` James Clark
0 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2024-08-13 14:28 UTC (permalink / raw)
To: James Clark
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>
> This overwrite relies on the clock event remaining at index 0 and is
> quite a way down from where the array is initialized, making it easy to
> miss. Just initialize it with the correct clock event to begin with.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/builtin-stat.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 1f92445f7480..a65f58f8783f 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
> {
> struct perf_event_attr default_attrs0[] = {
>
> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
> + PERF_COUNT_SW_CPU_CLOCK :
> + PERF_COUNT_SW_TASK_CLOCK },
Hand crafting perf_event_attr when we have an event name to
perf_event_atttr parser doesn't make sense. Doing things this way
means we need to duplicate logic between event parsing and these
default configurations. The default configurations are also using
legacy events which of course are broken on Apple ARM M? (albeit for
hardware events, here it is software). Event and metric parsing has to
worry about things like grouping topdown events. All-in-all let's have
one way to do things, event parsing, otherwise this code is going to
end up reinventing all the workarounds the event parsing has to have.
Lots of struct perf_event_attr also contribute to binary size.
If you are worried about a cycles event being opened on arm_dsu PMUs,
there is this patch:
https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
Thanks,
Ian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
` (6 preceding siblings ...)
2024-08-13 13:23 ` [PATCH 7/7] perf test: Add a test for default perf stat command James Clark
@ 2024-08-13 14:35 ` Ian Rogers
2024-08-13 14:45 ` James Clark
7 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2024-08-13 14:35 UTC (permalink / raw)
To: James Clark
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Dominique Martinet, Yang Jihong,
Ze Gao, linux-kernel, linux-arm-kernel
On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>
> The important patches are 3 and 5, the rest are tidyups and tests.
>
> I don't think there is any interaction with the other open issues
> about the uncore DSU cycles event or JSON/legacy hw event priorities
> because only hw events on core PMUs are used for the default
> stat command. And also just sharing the existing x86 code works so
> no big changes are required.
>
> For patch 3 the weak arch specific symbol has to continue to be used
> rather than picking the implementation based on
> perf_pmus__supports_extended_type() like in patch 5. This is because
> that function ends up calling evsel__hw_name() itself which results
> in recursion. But at least one weak arch_* construct has been removed,
> so it's better than nothing.
Let's not do things this way. The use of strings is architecture
neutral, means we don't need to create new arch functions on things
like RISC-V, it encapsulates the complexity of things like topdown
events, Apple ARM M CPUs not supporting legacy events, etc.
Duplicating the existing x86 logic, when that was something trying to
be removed, is not the way to go. That logic was a holdover from the
hybrid tech debt we've been working to remove with a generic approach.
Thanks,
Ian
> James Clark (7):
> perf stat: Initialize instead of overwriting clock event
> perf stat: Remove unused default_null_attrs
> perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
> perf evsel: Remove duplicated __evsel__hw_name() code
> perf evlist: Use hybrid default attrs whenever extended type is
> supported
> perf test: Make stat test work on DT devices
> perf test: Add a test for default perf stat command
>
> tools/perf/arch/arm64/util/Build | 1 +
> tools/perf/arch/arm64/util/evsel.c | 7 ++++
> tools/perf/arch/x86/util/evlist.c | 65 ------------------------------
> tools/perf/arch/x86/util/evsel.c | 17 +-------
> tools/perf/builtin-stat.c | 12 ++----
> tools/perf/tests/shell/stat.sh | 33 ++++++++++++---
> tools/perf/util/evlist.c | 65 ++++++++++++++++++++++++++----
> tools/perf/util/evlist.h | 6 +--
> tools/perf/util/evsel.c | 19 +++++++++
> tools/perf/util/evsel.h | 2 +-
> 10 files changed, 119 insertions(+), 108 deletions(-)
> create mode 100644 tools/perf/arch/arm64/util/evsel.c
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
2024-08-13 14:28 ` Ian Rogers
@ 2024-08-13 14:38 ` James Clark
2024-08-13 14:43 ` Ian Rogers
0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 14:38 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
On 13/08/2024 3:28 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>
>> This overwrite relies on the clock event remaining at index 0 and is
>> quite a way down from where the array is initialized, making it easy to
>> miss. Just initialize it with the correct clock event to begin with.
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> tools/perf/builtin-stat.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 1f92445f7480..a65f58f8783f 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>> {
>> struct perf_event_attr default_attrs0[] = {
>>
>> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
>> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
>> + PERF_COUNT_SW_CPU_CLOCK :
>> + PERF_COUNT_SW_TASK_CLOCK },
>
> Hand crafting perf_event_attr when we have an event name to
> perf_event_atttr parser doesn't make sense. Doing things this way
> means we need to duplicate logic between event parsing and these
> default configurations. The default configurations are also using
> legacy events which of course are broken on Apple ARM M? (albeit for
> hardware events, here it is software). Event and metric parsing has to
> worry about things like grouping topdown events. All-in-all let's have
> one way to do things, event parsing, otherwise this code is going to
> end up reinventing all the workarounds the event parsing has to have.
> Lots of struct perf_event_attr also contribute to binary size.
>
> If you are worried about a cycles event being opened on arm_dsu PMUs,
> there is this patch:
> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>
> Thanks,
> Ian
Hi Ian,
Is this comment related to this patch specifically or is it more of a
general comment?
This patch doesn't really make any actual changes other than move one
line of code from one place to another.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
2024-08-13 14:38 ` James Clark
@ 2024-08-13 14:43 ` Ian Rogers
2024-08-13 14:56 ` James Clark
0 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2024-08-13 14:43 UTC (permalink / raw)
To: James Clark
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 13/08/2024 3:28 pm, Ian Rogers wrote:
> > On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> This overwrite relies on the clock event remaining at index 0 and is
> >> quite a way down from where the array is initialized, making it easy to
> >> miss. Just initialize it with the correct clock event to begin with.
> >>
> >> Signed-off-by: James Clark <james.clark@linaro.org>
> >> ---
> >> tools/perf/builtin-stat.c | 7 +++----
> >> 1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> >> index 1f92445f7480..a65f58f8783f 100644
> >> --- a/tools/perf/builtin-stat.c
> >> +++ b/tools/perf/builtin-stat.c
> >> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
> >> {
> >> struct perf_event_attr default_attrs0[] = {
> >>
> >> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
> >> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
> >> + PERF_COUNT_SW_CPU_CLOCK :
> >> + PERF_COUNT_SW_TASK_CLOCK },
> >
> > Hand crafting perf_event_attr when we have an event name to
> > perf_event_atttr parser doesn't make sense. Doing things this way
> > means we need to duplicate logic between event parsing and these
> > default configurations. The default configurations are also using
> > legacy events which of course are broken on Apple ARM M? (albeit for
> > hardware events, here it is software). Event and metric parsing has to
> > worry about things like grouping topdown events. All-in-all let's have
> > one way to do things, event parsing, otherwise this code is going to
> > end up reinventing all the workarounds the event parsing has to have.
> > Lots of struct perf_event_attr also contribute to binary size.
> >
> > If you are worried about a cycles event being opened on arm_dsu PMUs,
> > there is this patch:
> > https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >
> > Thanks,
> > Ian
>
> Hi Ian,
>
> Is this comment related to this patch specifically or is it more of a
> general comment?
>
> This patch doesn't really make any actual changes other than move one
> line of code from one place to another.
James, this code is removed here:
https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
Thanks,
Ian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
@ 2024-08-13 14:45 ` James Clark
2024-08-13 15:10 ` James Clark
0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2024-08-13 14:45 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Dominique Martinet, Ze Gao,
linux-kernel, linux-arm-kernel
On 13/08/2024 3:35 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>
>> The important patches are 3 and 5, the rest are tidyups and tests.
>>
>> I don't think there is any interaction with the other open issues
>> about the uncore DSU cycles event or JSON/legacy hw event priorities
>> because only hw events on core PMUs are used for the default
>> stat command. And also just sharing the existing x86 code works so
>> no big changes are required.
>>
>> For patch 3 the weak arch specific symbol has to continue to be used
>> rather than picking the implementation based on
>> perf_pmus__supports_extended_type() like in patch 5. This is because
>> that function ends up calling evsel__hw_name() itself which results
>> in recursion. But at least one weak arch_* construct has been removed,
>> so it's better than nothing.
>
> Let's not do things this way. The use of strings is architecture
> neutral, means we don't need to create new arch functions on things
> like RISC-V, it encapsulates the complexity of things like topdown
If the new arch function is an issue that could be worked around by
calling perf_pmus__supports_extended_type() on patch 3 as well? It just
needs a small change to not recurse.
> events, Apple ARM M CPUs not supporting legacy events, etc.
If Apple M doesn't support the HW events does _any_ default Perf stat
command (hybrid or not) work? I'm not really trying to fix that here,
just make whatever already works work on big.LITTLE.
> Duplicating the existing x86 logic, when that was something trying to
> be removed, is not the way to go. That logic was a holdover from the
> hybrid tech debt we've been working to remove with a generic approach.
>
> Thanks,
> Ian
>
I think all of that may make sense, but in this case I haven't actually
duplicated anything, rather shared the existing code to also be used on
Arm.
This means we can have the default perf stat working on Arm from today,
and if any other changes get made it will continue to work as I've also
added a test for it.
>> James Clark (7):
>> perf stat: Initialize instead of overwriting clock event
>> perf stat: Remove unused default_null_attrs
>> perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>> perf evsel: Remove duplicated __evsel__hw_name() code
>> perf evlist: Use hybrid default attrs whenever extended type is
>> supported
>> perf test: Make stat test work on DT devices
>> perf test: Add a test for default perf stat command
>>
>> tools/perf/arch/arm64/util/Build | 1 +
>> tools/perf/arch/arm64/util/evsel.c | 7 ++++
>> tools/perf/arch/x86/util/evlist.c | 65 ------------------------------
>> tools/perf/arch/x86/util/evsel.c | 17 +-------
>> tools/perf/builtin-stat.c | 12 ++----
>> tools/perf/tests/shell/stat.sh | 33 ++++++++++++---
>> tools/perf/util/evlist.c | 65 ++++++++++++++++++++++++++----
>> tools/perf/util/evlist.h | 6 +--
>> tools/perf/util/evsel.c | 19 +++++++++
>> tools/perf/util/evsel.h | 2 +-
>> 10 files changed, 119 insertions(+), 108 deletions(-)
>> create mode 100644 tools/perf/arch/arm64/util/evsel.c
>>
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] perf stat: Initialize instead of overwriting clock event
2024-08-13 14:43 ` Ian Rogers
@ 2024-08-13 14:56 ` James Clark
0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 14:56 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Yang Jihong, Ze Gao,
Dominique Martinet, linux-kernel, linux-arm-kernel
On 13/08/2024 3:43 pm, Ian Rogers wrote:
> On Tue, Aug 13, 2024 at 7:38 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 13/08/2024 3:28 pm, Ian Rogers wrote:
>>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> This overwrite relies on the clock event remaining at index 0 and is
>>>> quite a way down from where the array is initialized, making it easy to
>>>> miss. Just initialize it with the correct clock event to begin with.
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>> tools/perf/builtin-stat.c | 7 +++----
>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>> index 1f92445f7480..a65f58f8783f 100644
>>>> --- a/tools/perf/builtin-stat.c
>>>> +++ b/tools/perf/builtin-stat.c
>>>> @@ -1817,7 +1817,9 @@ static int add_default_attributes(void)
>>>> {
>>>> struct perf_event_attr default_attrs0[] = {
>>>>
>>>> - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
>>>> + { .type = PERF_TYPE_SOFTWARE, .config = target__has_cpu(&target) ?
>>>> + PERF_COUNT_SW_CPU_CLOCK :
>>>> + PERF_COUNT_SW_TASK_CLOCK },
>>>
>>> Hand crafting perf_event_attr when we have an event name to
>>> perf_event_atttr parser doesn't make sense. Doing things this way
>>> means we need to duplicate logic between event parsing and these
>>> default configurations. The default configurations are also using
>>> legacy events which of course are broken on Apple ARM M? (albeit for
>>> hardware events, here it is software). Event and metric parsing has to
>>> worry about things like grouping topdown events. All-in-all let's have
>>> one way to do things, event parsing, otherwise this code is going to
>>> end up reinventing all the workarounds the event parsing has to have.
>>> Lots of struct perf_event_attr also contribute to binary size.
>>>
>>> If you are worried about a cycles event being opened on arm_dsu PMUs,
>>> there is this patch:
>>> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
>>>
>>> Thanks,
>>> Ian
>>
>> Hi Ian,
>>
>> Is this comment related to this patch specifically or is it more of a
>> general comment?
>>
>> This patch doesn't really make any actual changes other than move one
>> line of code from one place to another.
>
> James, this code is removed here:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
>
> Thanks,
> Ian
Oh I see yeah. We can still work on that change, merging this one
doesn't necessarily have to block that one, it just makes this one a bit
redundant when the other one gets done.
If I remember correctly in one of the last related discussions we
thought that opening the cycles event as a sampling event should be a
softer warning?
Actually it seems like the DSU cycles event is a non-issue specifically
for perf stat because it will open successfully anyway? It was just in
perf record where it was the issue?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE
2024-08-13 14:45 ` James Clark
@ 2024-08-13 15:10 ` James Clark
0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2024-08-13 15:10 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Liang, Kan, Dominique Martinet, Ze Gao,
linux-kernel, linux-arm-kernel
On 13/08/2024 3:45 pm, James Clark wrote:
>
>
> On 13/08/2024 3:35 pm, Ian Rogers wrote:
>> On Tue, Aug 13, 2024 at 6:24 AM James Clark <james.clark@linaro.org>
>> wrote:
>>>
>>> The important patches are 3 and 5, the rest are tidyups and tests.
>>>
>>> I don't think there is any interaction with the other open issues
>>> about the uncore DSU cycles event or JSON/legacy hw event priorities
>>> because only hw events on core PMUs are used for the default
>>> stat command. And also just sharing the existing x86 code works so
>>> no big changes are required.
>>>
>>> For patch 3 the weak arch specific symbol has to continue to be used
>>> rather than picking the implementation based on
>>> perf_pmus__supports_extended_type() like in patch 5. This is because
>>> that function ends up calling evsel__hw_name() itself which results
>>> in recursion. But at least one weak arch_* construct has been removed,
>>> so it's better than nothing.
>>
>> Let's not do things this way. The use of strings is architecture
>> neutral, means we don't need to create new arch functions on things
>> like RISC-V, it encapsulates the complexity of things like topdown
>
> If the new arch function is an issue that could be worked around by
> calling perf_pmus__supports_extended_type() on patch 3 as well? It just
> needs a small change to not recurse.
>
>> events, Apple ARM M CPUs not supporting legacy events, etc.
>
> If Apple M doesn't support the HW events does _any_ default Perf stat
> command (hybrid or not) work? I'm not really trying to fix that here,
> just make whatever already works work on big.LITTLE.
>
>> Duplicating the existing x86 logic, when that was something trying to
>> be removed, is not the way to go. That logic was a holdover from the
>> hybrid tech debt we've been working to remove with a generic approach.
>>
>> Thanks,
>> Ian
>>
>
> I think all of that may make sense, but in this case I haven't actually
> duplicated anything, rather shared the existing code to also be used on
> Arm.
>
> This means we can have the default perf stat working on Arm from today,
> and if any other changes get made it will continue to work as I've also
> added a test for it.
>
I would also like to note that (not including the new test) this
patchset _removes_ more code than it adds, so to say it duplicates code
is a bit unfair.
Of course this touches some similar areas to your other change, but that
doesn't mean I think we shouldn't continue with that one. I'm still
happy to review and test or contribute to that one if you like. I think
it just helped me to do it in this order and get this thing in a working
state first before the next bigger step.
>>> James Clark (7):
>>> perf stat: Initialize instead of overwriting clock event
>>> perf stat: Remove unused default_null_attrs
>>> perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86
>>> perf evsel: Remove duplicated __evsel__hw_name() code
>>> perf evlist: Use hybrid default attrs whenever extended type is
>>> supported
>>> perf test: Make stat test work on DT devices
>>> perf test: Add a test for default perf stat command
>>>
>>> tools/perf/arch/arm64/util/Build | 1 +
>>> tools/perf/arch/arm64/util/evsel.c | 7 ++++
>>> tools/perf/arch/x86/util/evlist.c | 65 ------------------------------
>>> tools/perf/arch/x86/util/evsel.c | 17 +-------
>>> tools/perf/builtin-stat.c | 12 ++----
>>> tools/perf/tests/shell/stat.sh | 33 ++++++++++++---
>>> tools/perf/util/evlist.c | 65 ++++++++++++++++++++++++++----
>>> tools/perf/util/evlist.h | 6 +--
>>> tools/perf/util/evsel.c | 19 +++++++++
>>> tools/perf/util/evsel.h | 2 +-
>>> 10 files changed, 119 insertions(+), 108 deletions(-)
>>> create mode 100644 tools/perf/arch/arm64/util/evsel.c
>>>
>>> --
>>> 2.34.1
>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-08-13 15:10 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 13:23 [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE James Clark
2024-08-13 13:23 ` [PATCH 1/7] perf stat: Initialize instead of overwriting clock event James Clark
2024-08-13 14:28 ` Ian Rogers
2024-08-13 14:38 ` James Clark
2024-08-13 14:43 ` Ian Rogers
2024-08-13 14:56 ` James Clark
2024-08-13 13:23 ` [PATCH 2/7] perf stat: Remove unused default_null_attrs James Clark
2024-08-13 13:23 ` [PATCH 3/7] perf evsel: Use the same arch_evsel__hw_name() on arm64 as x86 James Clark
2024-08-13 13:23 ` [PATCH 4/7] perf evsel: Remove duplicated __evsel__hw_name() code James Clark
2024-08-13 13:23 ` [PATCH 5/7] perf evlist: Use hybrid default attrs whenever extended type is supported James Clark
2024-08-13 13:33 ` James Clark
2024-08-13 13:23 ` [PATCH 6/7] perf test: Make stat test work on DT devices James Clark
2024-08-13 13:23 ` [PATCH 7/7] perf test: Add a test for default perf stat command James Clark
2024-08-13 14:35 ` [PATCH 0/7] perf stat: Make default perf stat command work on Arm big.LITTLE Ian Rogers
2024-08-13 14:45 ` James Clark
2024-08-13 15:10 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).