* [PATCH 0/3] perf list: Collapse similar events across PMUs
@ 2025-03-04 13:49 James Clark
2025-03-04 13:49 ` [PATCH 1/3] perf list: Order events by event name before PMU name James Clark
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: James Clark @ 2025-03-04 13:49 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, Leo Yan
Cc: linux-perf-users, linux-kernel, James Clark
Some changes related to the discussion here [1] about deduplication of
PMUs. This change stands on its own right now, but it may not go far
enough. This can either be dropped for now or applied and improved
later. Extra ideas are as follows.
Treating alphanumeric suffixes as duplicate has been slightly
problematic due to marketing strings having looks-like-but-not-actually
alphanumeric suffixes. For example 'cpum_cf' and now the one digit
longer than before 'cortex-a720'. The easy fix is to increase the
minimum length considered for deduplication as suggested [1], but as
also mentioned, the current mrvl_ddr_pmu PMU names don't zero pad the
address, meaning that > 2 alphanumeric suffixes are already technically
not enough to deduplicate the same PMUs. They could have only a 2 digit
alphanumeric address suffix. Increasing the minimum digits feels a bit
like kicking the can down the road and it places awkward limitations on
marketing names which we have no control over. Also I'm not sure helps
the following much:
The problem is that arm_cmn_[n] PMUs have a numeric suffix, but they can
have different events. Even if we were adding this PMU today, keeping
the suffix rule in mind, it would be difficult to come up with a suffix
that differentiates the different ones. Flavour words might work, but
that complicates the kernel which would have to group them and come up
with flavours rather than just doing an i++. Deduplicating too
aggressively on only PMU name suffix means only arm_cmn_1's events get
listed, missing other events, and it's hard to see which events relate
to which PMU.
Therefore in addition to the changes in this patchset I'd like to look
into:
* Collapsing duplicate PMU names into ranges, for example
arm_pmu_v3_[0-4], rather than simply concatenating names as done in
this patchset
* Deduplicate uncore based on the contents of events/ rather than just
the suffix
As some background, the original commit for deduplication, commit
3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
mentions reducing the number of duplicate PMUs, and is presumably
motivated by usability. But there are also other commits mentioning
reducing openat()s, for example lazily loading formats 504026412162
("perf pmu: Make the loading of formats lazy"). Deduplicating based on
the contents of the events/ folder is somewhat in contention with this
reduction, but could be done along side some more lazy loading (like of
the terms) and hashing the result of readdir() without opening any of
the contents. JSON tables can have event name hashes calculated at build
time if we want to consider them for deduplication too.
Then with the events hash, PMU's can be sorted based on this and the
'Unit:' string can be constructed with a set of values that collapses
adjacent suffixes to display as ranges. I believe that could remove the
need for any further changes to duplication based on suffix, but still
avoids over deduplication.
[1]: https://lore.kernel.org/linux-perf-users/CAP-5=fW_Sq4iFxoWPWuixz9fMLBPyPUO0RG0KPbYa-5T0DZbTA@mail.gmail.com/
---
James Clark (3):
perf list: Order events by event name before PMU name
perf list: Collapse similar events across PMUs
perf list: Don't deduplicate core PMUs when listing events
tools/perf/builtin-list.c | 2 +
tools/perf/util/pmu.c | 5 ++-
tools/perf/util/pmu.h | 2 +
tools/perf/util/pmus.c | 95 ++++++++++++++++++++++++++++++++++--------
tools/perf/util/print-events.h | 1 +
5 files changed, 86 insertions(+), 19 deletions(-)
---
base-commit: 7788ad59d1d9617792037a83513be5b1dd14150f
change-id: 20250304-james-perf-hybrid-list-11b888cf435a
Best regards,
--
James Clark <james.clark@linaro.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/3] perf list: Order events by event name before PMU name
2025-03-04 13:49 [PATCH 0/3] perf list: Collapse similar events across PMUs James Clark
@ 2025-03-04 13:49 ` James Clark
2025-03-05 20:59 ` Ian Rogers
2025-03-04 13:49 ` [PATCH 2/3] perf list: Collapse similar events across PMUs James Clark
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2025-03-04 13:49 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, Leo Yan
Cc: linux-perf-users, linux-kernel, James Clark
In order to be able to show a single line for the same events on
different PMUs, they need to be grouped by event name. This is because
deduplication relies on similar items being adjacent in the list.
Even without the following changes this would arguably be better
grouping because it's easier to find events in a topic alphabetically
by name, rather than in separate PMU blocks.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/util/pmus.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index dd7c2ffdab38..4d60bac2d2b9 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -445,15 +445,15 @@ static int cmp_sevent(const void *a, const void *b)
if (a_iscpu != b_iscpu)
return a_iscpu ? -1 : 1;
- /* Order by PMU name. */
- if (as->pmu != bs->pmu) {
- ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
- if (ret)
- return ret;
- }
-
/* Order by event name. */
- return strcmp(as->name, bs->name);
+ ret = strcmp(as->name, bs->name);
+ if (ret)
+ return ret;
+
+ /* Order by PMU name. */
+ if (as->pmu == bs->pmu)
+ return 0;
+ return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
}
static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-04 13:49 [PATCH 0/3] perf list: Collapse similar events across PMUs James Clark
2025-03-04 13:49 ` [PATCH 1/3] perf list: Order events by event name before PMU name James Clark
@ 2025-03-04 13:49 ` James Clark
2025-03-05 9:35 ` Leo Yan
2025-03-05 21:40 ` Ian Rogers
2025-03-04 13:49 ` [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events James Clark
` (2 subsequent siblings)
4 siblings, 2 replies; 19+ messages in thread
From: James Clark @ 2025-03-04 13:49 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, Leo Yan
Cc: linux-perf-users, linux-kernel, James Clark
Instead of showing multiple line items with the same event name and
description, show a single line and concatenate all PMUs that this
event can belong to.
Don't do it for json output. Machine readable output doesn't need to be
minimized, and changing the "Unit" field to a list type would break
backwards compatibility.
Before:
$ perf list -v
...
br_indirect_spec
[Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
br_indirect_spec
[Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
After:
$ perf list -v
...
br_indirect_spec
[Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/builtin-list.c | 2 ++
tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
tools/perf/util/print-events.h | 1 +
3 files changed, 70 insertions(+), 8 deletions(-)
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index fed482adb039..aacd7beae2a0 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
.print_event = default_print_event,
.print_metric = default_print_metric,
.skip_duplicate_pmus = default_skip_duplicate_pmus,
+ .collapse_events = true
};
const char *cputype = NULL;
const char *unit_name = NULL;
@@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
.print_event = json_print_event,
.print_metric = json_print_metric,
.skip_duplicate_pmus = json_skip_duplicate_pmus,
+ .collapse_events = false
};
ps = &json_ps;
} else {
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 4d60bac2d2b9..cb1b14ade25b 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
/* Order by PMU name. */
if (as->pmu == bs->pmu)
return 0;
- return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
+ ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
+ if (ret)
+ return ret;
+
+ /* Order by remaining displayed fields for purposes of deduplication later */
+ ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
+ if (ret)
+ return ret;
+ ret = !!as->deprecated - !!bs->deprecated;
+ if (ret)
+ return ret;
+ ret = strcmp(as->desc ?: "", bs->desc ?: "");
+ if (ret)
+ return ret;
+ return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
}
-static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
+enum dup_type {
+ UNIQUE,
+ DUPLICATE,
+ SAME_TEXT
+};
+
+static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
{
/* Different names -> never duplicates */
if (strcmp(a->name ?: "//", b->name ?: "//"))
- return false;
+ return UNIQUE;
+
+ /* Duplicate PMU name and event name -> hide completely */
+ if (strcmp(a->pmu_name, b->pmu_name) == 0)
+ return DUPLICATE;
+
+ /* Any other different display text -> not duplicate */
+ if (strcmp(a->topic ?: "", b->topic ?: "") ||
+ strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
+ a->deprecated != b->deprecated ||
+ strcmp(a->desc ?: "", b->desc ?: "") ||
+ strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
+ return UNIQUE;
+ }
- /* Don't remove duplicates for different PMUs */
- return strcmp(a->pmu_name, b->pmu_name) == 0;
+ /* Same display text but different PMU -> collapse */
+ return SAME_TEXT;
}
struct events_callback_state {
@@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
return 0;
}
+static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
+{
+ size_t len = strlen(pmu_names);
+ size_t added;
+
+ if (len)
+ added = snprintf(pmu_names + len, size - len, ",%s", b);
+ else
+ added = snprintf(pmu_names, size, "%s,%s", a, b);
+
+ /* Truncate with ... */
+ if (added > 0 && added + len >= size)
+ sprintf(pmu_names + size - 4, "...");
+}
+
void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
{
struct perf_pmu *pmu;
@@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
struct events_callback_state state;
bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
struct perf_pmu *(*scan_fn)(struct perf_pmu *);
+ char pmu_names[128] = {0};
if (skip_duplicate_pmus)
scan_fn = perf_pmus__scan_skip_duplicates;
@@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
for (int j = 0; j < len; j++) {
/* Skip duplicates */
- if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
- goto free;
+ if (j < len - 1) {
+ enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
+
+ if (dt == DUPLICATE) {
+ goto free;
+ } else if (print_cb->collapse_events && dt == SAME_TEXT) {
+ concat_pmu_names(pmu_names, sizeof(pmu_names),
+ aliases[j].pmu_name, aliases[j+1].pmu_name);
+ goto free;
+ }
+ }
print_cb->print_event(print_state,
aliases[j].topic,
- aliases[j].pmu_name,
+ pmu_names[0] ? pmu_names : aliases[j].pmu_name,
aliases[j].name,
aliases[j].alias,
aliases[j].scale_unit,
@@ -553,6 +611,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
aliases[j].desc,
aliases[j].long_desc,
aliases[j].encoding_desc);
+ pmu_names[0] = '\0';
free:
zfree(&aliases[j].name);
zfree(&aliases[j].alias);
diff --git a/tools/perf/util/print-events.h b/tools/perf/util/print-events.h
index 445efa1636c1..e91f9f830a2a 100644
--- a/tools/perf/util/print-events.h
+++ b/tools/perf/util/print-events.h
@@ -27,6 +27,7 @@ struct print_callbacks {
const char *threshold,
const char *unit);
bool (*skip_duplicate_pmus)(void *print_state);
+ bool collapse_events;
};
/** Print all events, the default when no options are specified. */
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events
2025-03-04 13:49 [PATCH 0/3] perf list: Collapse similar events across PMUs James Clark
2025-03-04 13:49 ` [PATCH 1/3] perf list: Order events by event name before PMU name James Clark
2025-03-04 13:49 ` [PATCH 2/3] perf list: Collapse similar events across PMUs James Clark
@ 2025-03-04 13:49 ` James Clark
2025-03-05 21:51 ` Ian Rogers
2025-03-05 9:26 ` [PATCH 0/3] perf list: Collapse similar events across PMUs Leo Yan
2025-03-05 20:38 ` Ian Rogers
4 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2025-03-04 13:49 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, Leo Yan
Cc: linux-perf-users, linux-kernel, James Clark
Commit 7afbf90ea2e2 ("perf pmu: Don't de-duplicate core PMUs") fixed a
display mismatch related to deduplication within a single PMU, but it
didn't fix the case where deduplicated PMUs aren't listed at all.
Fix it by using the same function which takes is_core into account,
except in the use_core_pmus block where it's always going to be true.
Before this change, -v would be required to get the same behavior for
core PMUs. Now it's no longer required:
Before:
$ perf list | grep br_indirect_spec -A 1
br_indirect_spec
[Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
After:
$ perf list | grep br_indirect_spec -A 2
[Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,
armv8_cortex_a57]
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/util/pmu.c | 5 +++--
tools/perf/util/pmu.h | 2 ++
tools/perf/util/pmus.c | 8 +++++---
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 57450c73fb63..caff0d309012 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -834,9 +834,10 @@ static int is_sysfs_pmu_core(const char *name)
*
* @skip_duplicate_pmus: False in verbose mode so all uncore PMUs are visible
*/
-static size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
- bool skip_duplicate_pmus)
+size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
+ bool skip_duplicate_pmus)
{
+ name = name ?: "";
return skip_duplicate_pmus && !pmu->is_core
? pmu_name_len_no_suffix(name)
: strlen(name);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index b93014cc3670..ce6a394a695d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -297,5 +297,7 @@ struct perf_pmu *perf_pmus__find_core_pmu(void);
const char *perf_pmu__name_from_config(struct perf_pmu *pmu, u64 config);
bool perf_pmu__is_fake(const struct perf_pmu *pmu);
+size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
+ bool skip_duplicate_pmus);
#endif /* __PMU_H */
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index cb1b14ade25b..1acc27af4d02 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -358,12 +358,14 @@ static struct perf_pmu *perf_pmus__scan_skip_duplicates(struct perf_pmu *pmu)
if (!pmu) {
pmu_read_sysfs(PERF_TOOL_PMU_TYPE_ALL_MASK);
pmu = list_prepare_entry(pmu, &core_pmus, list);
- } else
- last_pmu_name_len = pmu_name_len_no_suffix(pmu->name ?: "");
+ } else {
+ last_pmu_name_len = pmu_deduped_name_len(pmu, pmu->name,
+ /*skip_duplicate_pmus=*/true);
+ }
if (use_core_pmus) {
list_for_each_entry_continue(pmu, &core_pmus, list) {
- int pmu_name_len = pmu_name_len_no_suffix(pmu->name ?: "");
+ int pmu_name_len = strlen(pmu->name ?: "");
if (last_pmu_name_len == pmu_name_len &&
!strncmp(last_pmu_name, pmu->name ?: "", pmu_name_len))
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] perf list: Collapse similar events across PMUs
2025-03-04 13:49 [PATCH 0/3] perf list: Collapse similar events across PMUs James Clark
` (2 preceding siblings ...)
2025-03-04 13:49 ` [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events James Clark
@ 2025-03-05 9:26 ` Leo Yan
2025-03-05 20:38 ` Ian Rogers
4 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2025-03-05 9:26 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, linux-perf-users,
linux-kernel
On Tue, Mar 04, 2025 at 01:49:12PM +0000, James Clark wrote:
[...]
> As some background, the original commit for deduplication, commit
> 3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
> mentions reducing the number of duplicate PMUs, and is presumably
> motivated by usability. But there are also other commits mentioning
> reducing openat()s, for example lazily loading formats 504026412162
> ("perf pmu: Make the loading of formats lazy"). Deduplicating based on
> the contents of the events/ folder is somewhat in contention with this
> reduction, but could be done along side some more lazy loading (like of
> the terms) and hashing the result of readdir() without opening any of
> the contents. JSON tables can have event name hashes calculated at build
> time if we want to consider them for deduplication too.
>
> Then with the events hash, PMU's can be sorted based on this and the
> 'Unit:' string can be constructed with a set of values that collapses
> adjacent suffixes to display as ranges. I believe that could remove the
> need for any further changes to duplication based on suffix, but still
> avoids over deduplication.
I did a test with a simulated platform with cortex-a510 and cortex-a710
cores, the result looks correct to me:
# perf list -d 2>&1 | grep br_mis_pred_retired -A 2
br_mis_pred_retired
[Instruction architecturally executed,mispredicted branch. Unit:
armv9_cortex_a510,armv9_cortex_a710]
Tested-by: Leo Yan <leo.yan@arm.com>
However, I will have minor comments on two patches and reply
separately. Please take a look.
Thanks,
Leo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-04 13:49 ` [PATCH 2/3] perf list: Collapse similar events across PMUs James Clark
@ 2025-03-05 9:35 ` Leo Yan
2025-03-05 9:49 ` Leo Yan
2025-03-05 21:40 ` Ian Rogers
1 sibling, 1 reply; 19+ messages in thread
From: Leo Yan @ 2025-03-05 9:35 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, linux-perf-users,
linux-kernel
On Tue, Mar 04, 2025 at 01:49:14PM +0000, James Clark wrote:
> Instead of showing multiple line items with the same event name and
> description, show a single line and concatenate all PMUs that this
> event can belong to.
>
> Don't do it for json output. Machine readable output doesn't need to be
> minimized, and changing the "Unit" field to a list type would break
> backwards compatibility.
>
> Before:
> $ perf list -v
> ...
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
>
> After:
>
> $ perf list -v
> ...
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
From my testing, based on the latest tmp.perf-tools-next branch, I
need to remove the option '-v'. Otherwise, the '-v' option will
enable long event descriptions, in this case, it does _not_ print Unit
strings at all (see default_print_event() in builtin-list.c).
default_print_event()
{
...
if (long_desc && print_state->long_desc) {
fprintf(fp, "%*s", 8, "[");
wordwrap(fp, long_desc, 8, pager_get_columns(), 0);
fprintf(fp, "]\n");
} else if (desc && print_state->desc) {
...
}
The command `perf list` would be sufficent.
Thanks,
Leo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-05 9:35 ` Leo Yan
@ 2025-03-05 9:49 ` Leo Yan
0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2025-03-05 9:49 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Robin Murphy, linux-perf-users,
linux-kernel
On Wed, Mar 05, 2025 at 09:35:45AM +0000, Leo Yan wrote:
> On Tue, Mar 04, 2025 at 01:49:14PM +0000, James Clark wrote:
> > Instead of showing multiple line items with the same event name and
> > description, show a single line and concatenate all PMUs that this
> > event can belong to.
> >
> > Don't do it for json output. Machine readable output doesn't need to be
> > minimized, and changing the "Unit" field to a list type would break
> > backwards compatibility.
> >
> > Before:
> > $ perf list -v
> > ...
> > br_indirect_spec
> > [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
> > br_indirect_spec
> > [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
> >
> > After:
> >
> > $ perf list -v
> > ...
> > br_indirect_spec
> > [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
>
> From my testing, based on the latest tmp.perf-tools-next branch, I
> need to remove the option '-v'. Otherwise, the '-v' option will
> enable long event descriptions, in this case, it does _not_ print Unit
> strings at all (see default_print_event() in builtin-list.c).
Ah, actually I worked on the Linux master branch. After switched to
tmp.perf-tools-next branch, it works pretty well.
perf list -v 2>&1 | grep br_mis_pred_retired -A 2
...
br_mis_pred_retired
[Instruction architecturally executed,mispredicted branch.
Unit:
armv9_cortex_a510,armv9_cortex_a710]
Please ignore my previous comment. The series looks good to me.
Thanks,
Leo
> default_print_event()
> {
> ...
> if (long_desc && print_state->long_desc) {
> fprintf(fp, "%*s", 8, "[");
> wordwrap(fp, long_desc, 8, pager_get_columns(), 0);
> fprintf(fp, "]\n");
> } else if (desc && print_state->desc) {
> ...
> }
>
> The command `perf list` would be sufficent.
>
> Thanks,
> Leo
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] perf list: Collapse similar events across PMUs
2025-03-04 13:49 [PATCH 0/3] perf list: Collapse similar events across PMUs James Clark
` (3 preceding siblings ...)
2025-03-05 9:26 ` [PATCH 0/3] perf list: Collapse similar events across PMUs Leo Yan
@ 2025-03-05 20:38 ` Ian Rogers
2025-03-07 14:47 ` James Clark
4 siblings, 1 reply; 19+ messages in thread
From: Ian Rogers @ 2025-03-05 20:38 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>
> Some changes related to the discussion here [1] about deduplication of
> PMUs. This change stands on its own right now, but it may not go far
> enough. This can either be dropped for now or applied and improved
> later. Extra ideas are as follows.
>
> Treating alphanumeric suffixes as duplicate has been slightly
> problematic due to marketing strings having looks-like-but-not-actually
> alphanumeric suffixes. For example 'cpum_cf' and now the one digit
> longer than before 'cortex-a720'. The easy fix is to increase the
> minimum length considered for deduplication as suggested [1], but as
> also mentioned, the current mrvl_ddr_pmu PMU names don't zero pad the
> address, meaning that > 2 alphanumeric suffixes are already technically
> not enough to deduplicate the same PMUs. They could have only a 2 digit
> alphanumeric address suffix. Increasing the minimum digits feels a bit
> like kicking the can down the road and it places awkward limitations on
> marketing names which we have no control over. Also I'm not sure helps
> the following much:
The hex suffix thing was very unfortunate, can we reverse that
decision and move the physical address.. it captures into a caps file?
Fwiw, we have a similar hex mess with raw events. A raw event can be
r0xead or read, but read also makes a pretty nice event name. This is
solved by first checking if read is an event and if not assuming it is
a raw encoding. Would anyone really care if raw events always had to
start with r0x? We seem to get tangled up in corner cases like this
too often, for example legacy event priorities and topdown event
sorting. Keeping things simple would be nice from a correctness pov
but also so that it's easy for other tools to emulate.
> The problem is that arm_cmn_[n] PMUs have a numeric suffix, but they can
> have different events. Even if we were adding this PMU today, keeping
> the suffix rule in mind, it would be difficult to come up with a suffix
> that differentiates the different ones. Flavour words might work, but
> that complicates the kernel which would have to group them and come up
> with flavours rather than just doing an i++. Deduplicating too
> aggressively on only PMU name suffix means only arm_cmn_1's events get
> listed, missing other events, and it's hard to see which events relate
> to which PMU.
>
> Therefore in addition to the changes in this patchset I'd like to look
> into:
>
> * Collapsing duplicate PMU names into ranges, for example
> arm_pmu_v3_[0-4], rather than simply concatenating names as done in
> this patchset
The current suffix rule is weird as Intel's GPU i915 PMU appears to be
a PMU called 'i' with a 915 numeric suffix.
I think capturing the rules in the ABI doc and some kind of legacy
compatibility is do-able.
I like regular expressions over globs/fnmatch, for example, we could
use flex to turn the CPUID matches in mapfile.csv into something that
is parsed and matched with less interpretation/compilation at runtime.
I suspect we can bike-shed for a long time on what these new rules
will be, which has been why I've generally just tried to match
existing and inelegant behaviors.
> * Deduplicate uncore based on the contents of events/ rather than just
> the suffix
>
> As some background, the original commit for deduplication, commit
> 3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
> mentions reducing the number of duplicate PMUs, and is presumably
> motivated by usability. But there are also other commits mentioning
> reducing openat()s, for example lazily loading formats 504026412162
> ("perf pmu: Make the loading of formats lazy"). Deduplicating based on
> the contents of the events/ folder is somewhat in contention with this
> reduction, but could be done along side some more lazy loading (like of
> the terms) and hashing the result of readdir() without opening any of
> the contents. JSON tables can have event name hashes calculated at build
> time if we want to consider them for deduplication too.
I worry about the run time cost of this when there are 100s of uncore
PMUs. I wonder if the case of an empty events directory would also be
common.
Wrt the x86 conventions I think Kan is best placed to say what the
preferred convention should be.
The json will routinely say this event is on a PMU name without a
suffix, and the number of PMUs will only be known at runtime. Hashing
without the suffix would be fine but I think the controversy is over
what should be considered a suffix.
Thanks for doing the series, I'll dig into/test each one in turn.
Ian
> Then with the events hash, PMU's can be sorted based on this and the
> 'Unit:' string can be constructed with a set of values that collapses
> adjacent suffixes to display as ranges. I believe that could remove the
> need for any further changes to duplication based on suffix, but still
> avoids over deduplication.
>
> [1]: https://lore.kernel.org/linux-perf-users/CAP-5=fW_Sq4iFxoWPWuixz9fMLBPyPUO0RG0KPbYa-5T0DZbTA@mail.gmail.com/
>
> ---
> James Clark (3):
> perf list: Order events by event name before PMU name
> perf list: Collapse similar events across PMUs
> perf list: Don't deduplicate core PMUs when listing events
>
> tools/perf/builtin-list.c | 2 +
> tools/perf/util/pmu.c | 5 ++-
> tools/perf/util/pmu.h | 2 +
> tools/perf/util/pmus.c | 95 ++++++++++++++++++++++++++++++++++--------
> tools/perf/util/print-events.h | 1 +
> 5 files changed, 86 insertions(+), 19 deletions(-)
> ---
> base-commit: 7788ad59d1d9617792037a83513be5b1dd14150f
> change-id: 20250304-james-perf-hybrid-list-11b888cf435a
>
> Best regards,
> --
> James Clark <james.clark@linaro.org>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] perf list: Order events by event name before PMU name
2025-03-04 13:49 ` [PATCH 1/3] perf list: Order events by event name before PMU name James Clark
@ 2025-03-05 20:59 ` Ian Rogers
0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2025-03-05 20:59 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>
> In order to be able to show a single line for the same events on
> different PMUs, they need to be grouped by event name. This is because
> deduplication relies on similar items being adjacent in the list.
>
> Even without the following changes this would arguably be better
> grouping because it's easier to find events in a topic alphabetically
> by name, rather than in separate PMU blocks.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/util/pmus.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> index dd7c2ffdab38..4d60bac2d2b9 100644
> --- a/tools/perf/util/pmus.c
> +++ b/tools/perf/util/pmus.c
> @@ -445,15 +445,15 @@ static int cmp_sevent(const void *a, const void *b)
> if (a_iscpu != b_iscpu)
> return a_iscpu ? -1 : 1;
>
> - /* Order by PMU name. */
> - if (as->pmu != bs->pmu) {
> - ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> - if (ret)
> - return ret;
> - }
> -
> /* Order by event name. */
> - return strcmp(as->name, bs->name);
> + ret = strcmp(as->name, bs->name);
> + if (ret)
> + return ret;
> +
> + /* Order by PMU name. */
> + if (as->pmu == bs->pmu)
> + return 0;
> + return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> }
>
> static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-04 13:49 ` [PATCH 2/3] perf list: Collapse similar events across PMUs James Clark
2025-03-05 9:35 ` Leo Yan
@ 2025-03-05 21:40 ` Ian Rogers
2025-03-07 14:08 ` James Clark
1 sibling, 1 reply; 19+ messages in thread
From: Ian Rogers @ 2025-03-05 21:40 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>
> Instead of showing multiple line items with the same event name and
> description, show a single line and concatenate all PMUs that this
> event can belong to.
>
> Don't do it for json output. Machine readable output doesn't need to be
> minimized, and changing the "Unit" field to a list type would break
> backwards compatibility.
>
> Before:
> $ perf list -v
> ...
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
>
> After:
>
> $ perf list -v
> ...
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/builtin-list.c | 2 ++
> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
> tools/perf/util/print-events.h | 1 +
> 3 files changed, 70 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
> index fed482adb039..aacd7beae2a0 100644
> --- a/tools/perf/builtin-list.c
> +++ b/tools/perf/builtin-list.c
> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
> .print_event = default_print_event,
> .print_metric = default_print_metric,
> .skip_duplicate_pmus = default_skip_duplicate_pmus,
> + .collapse_events = true
So collapse_events is put in the callbacks but isn't a callback. I
think skipping duplicates and collapsing are the same thing, I'd
prefer it if there weren't two terms for the same thing. If you prefer
collapsing as a name then default_skip_duplicate_pmus can be
default_collapse_pmus.
> };
> const char *cputype = NULL;
> const char *unit_name = NULL;
> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
> .print_event = json_print_event,
> .print_metric = json_print_metric,
> .skip_duplicate_pmus = json_skip_duplicate_pmus,
> + .collapse_events = false
> };
> ps = &json_ps;
> } else {
> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> index 4d60bac2d2b9..cb1b14ade25b 100644
> --- a/tools/perf/util/pmus.c
> +++ b/tools/perf/util/pmus.c
> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
> /* Order by PMU name. */
> if (as->pmu == bs->pmu)
> return 0;
> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> + if (ret)
> + return ret;
> +
> + /* Order by remaining displayed fields for purposes of deduplication later */
> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
> + if (ret)
> + return ret;
> + ret = !!as->deprecated - !!bs->deprecated;
> + if (ret)
> + return ret;
> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
> + if (ret)
> + return ret;
> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
> }
>
> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
> +enum dup_type {
> + UNIQUE,
> + DUPLICATE,
> + SAME_TEXT
> +};
> +
> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
> {
> /* Different names -> never duplicates */
> if (strcmp(a->name ?: "//", b->name ?: "//"))
> - return false;
> + return UNIQUE;
> +
> + /* Duplicate PMU name and event name -> hide completely */
> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
> + return DUPLICATE;
> +
> + /* Any other different display text -> not duplicate */
> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
> + a->deprecated != b->deprecated ||
> + strcmp(a->desc ?: "", b->desc ?: "") ||
> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
> + return UNIQUE;
> + }
>
> - /* Don't remove duplicates for different PMUs */
> - return strcmp(a->pmu_name, b->pmu_name) == 0;
> + /* Same display text but different PMU -> collapse */
> + return SAME_TEXT;
> }
>
> struct events_callback_state {
> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
> return 0;
> }
>
> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
> +{
> + size_t len = strlen(pmu_names);
> + size_t added;
> +
> + if (len)
> + added = snprintf(pmu_names + len, size - len, ",%s", b);
> + else
> + added = snprintf(pmu_names, size, "%s,%s", a, b);
> +
> + /* Truncate with ... */
> + if (added > 0 && added + len >= size)
> + sprintf(pmu_names + size - 4, "...");
> +}
> +
> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
> {
> struct perf_pmu *pmu;
> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> struct events_callback_state state;
> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
> + char pmu_names[128] = {0};
>
> if (skip_duplicate_pmus)
> scan_fn = perf_pmus__scan_skip_duplicates;
> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
> for (int j = 0; j < len; j++) {
> /* Skip duplicates */
> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
> - goto free;
> + if (j < len - 1) {
> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
> +
> + if (dt == DUPLICATE) {
> + goto free;
> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
> + concat_pmu_names(pmu_names, sizeof(pmu_names),
> + aliases[j].pmu_name, aliases[j+1].pmu_name);
> + goto free;
> + }
> + }
I think a label called 'free' is a little unfortunate given the
function called free.
When I did things with sevent I was bringing over previous `perf list`
code mainly so that the perf list output before and after the changes
was identical. I wonder if this logic would be better placed in the
callback like default_print_event which already maintains state
(last_topic) to optionally print different things. This may better
encapsulate the behavior rather than the logic being in the PMU code.
>
> print_cb->print_event(print_state,
> aliases[j].topic,
> - aliases[j].pmu_name,
> + pmu_names[0] ? pmu_names : aliases[j].pmu_name,
> aliases[j].name,
> aliases[j].alias,
> aliases[j].scale_unit,
> @@ -553,6 +611,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> aliases[j].desc,
> aliases[j].long_desc,
> aliases[j].encoding_desc);
The encoding_desc will have a PMU with the suffix removed as per
skipping duplicates:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1954
So I think something like:
```
br_mis_pred_retired
[Instruction architecturally executed,mispredicted branch. Unit:
armv9_cortex_a510,armv9_cortex_a710]
```
Would have an encoding of `armv9_cortex_a510/.../` without the a710
encoding being present. That said, I'm not sure anyone cares :-)
Thanks,
Ian
> + pmu_names[0] = '\0';
> free:
> zfree(&aliases[j].name);
> zfree(&aliases[j].alias);
> diff --git a/tools/perf/util/print-events.h b/tools/perf/util/print-events.h
> index 445efa1636c1..e91f9f830a2a 100644
> --- a/tools/perf/util/print-events.h
> +++ b/tools/perf/util/print-events.h
> @@ -27,6 +27,7 @@ struct print_callbacks {
> const char *threshold,
> const char *unit);
> bool (*skip_duplicate_pmus)(void *print_state);
> + bool collapse_events;
> };
>
> /** Print all events, the default when no options are specified. */
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events
2025-03-04 13:49 ` [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events James Clark
@ 2025-03-05 21:51 ` Ian Rogers
2025-03-07 14:08 ` James Clark
0 siblings, 1 reply; 19+ messages in thread
From: Ian Rogers @ 2025-03-05 21:51 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>
> Commit 7afbf90ea2e2 ("perf pmu: Don't de-duplicate core PMUs") fixed a
> display mismatch related to deduplication within a single PMU, but it
> didn't fix the case where deduplicated PMUs aren't listed at all.
>
> Fix it by using the same function which takes is_core into account,
> except in the use_core_pmus block where it's always going to be true.
> Before this change, -v would be required to get the same behavior for
> core PMUs. Now it's no longer required:
>
> Before:
> $ perf list | grep br_indirect_spec -A 1
> br_indirect_spec
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
>
> After:
> $ perf list | grep br_indirect_spec -A 2
> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,
> armv8_cortex_a57]
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/util/pmu.c | 5 +++--
> tools/perf/util/pmu.h | 2 ++
> tools/perf/util/pmus.c | 8 +++++---
> 3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 57450c73fb63..caff0d309012 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -834,9 +834,10 @@ static int is_sysfs_pmu_core(const char *name)
> *
> * @skip_duplicate_pmus: False in verbose mode so all uncore PMUs are visible
> */
> -static size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
> - bool skip_duplicate_pmus)
> +size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
> + bool skip_duplicate_pmus)
nit: I think the name should be perf_pmu__deduped_name_len for
consistency with the other non-static functions.
> {
> + name = name ?: "";
nit: Should this just use pmu->name ?
> return skip_duplicate_pmus && !pmu->is_core
> ? pmu_name_len_no_suffix(name)
> : strlen(name);
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index b93014cc3670..ce6a394a695d 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -297,5 +297,7 @@ struct perf_pmu *perf_pmus__find_core_pmu(void);
>
> const char *perf_pmu__name_from_config(struct perf_pmu *pmu, u64 config);
> bool perf_pmu__is_fake(const struct perf_pmu *pmu);
> +size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
> + bool skip_duplicate_pmus);
>
> #endif /* __PMU_H */
> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> index cb1b14ade25b..1acc27af4d02 100644
> --- a/tools/perf/util/pmus.c
> +++ b/tools/perf/util/pmus.c
> @@ -358,12 +358,14 @@ static struct perf_pmu *perf_pmus__scan_skip_duplicates(struct perf_pmu *pmu)
> if (!pmu) {
> pmu_read_sysfs(PERF_TOOL_PMU_TYPE_ALL_MASK);
> pmu = list_prepare_entry(pmu, &core_pmus, list);
> - } else
> - last_pmu_name_len = pmu_name_len_no_suffix(pmu->name ?: "");
> + } else {
> + last_pmu_name_len = pmu_deduped_name_len(pmu, pmu->name,
> + /*skip_duplicate_pmus=*/true);
> + }
>
> if (use_core_pmus) {
> list_for_each_entry_continue(pmu, &core_pmus, list) {
> - int pmu_name_len = pmu_name_len_no_suffix(pmu->name ?: "");
> + int pmu_name_len = strlen(pmu->name ?: "");
>
> if (last_pmu_name_len == pmu_name_len &&
> !strncmp(last_pmu_name, pmu->name ?: "", pmu_name_len))
Can this code be removed given there shouldn't be core PMUs with
identical names? ie:
```
if (use_core_pmus) {
list_for_each_entry_continue(pmu, &core_pmus, list)
return pmu;
pmu = NULL;
pmu = list_prepare_entry(pmu, &other_pmus, list);
}
```
Thanks,
Ian
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events
2025-03-05 21:51 ` Ian Rogers
@ 2025-03-07 14:08 ` James Clark
0 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2025-03-07 14:08 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On 05/03/2025 9:51 pm, Ian Rogers wrote:
> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>>
>> Commit 7afbf90ea2e2 ("perf pmu: Don't de-duplicate core PMUs") fixed a
>> display mismatch related to deduplication within a single PMU, but it
>> didn't fix the case where deduplicated PMUs aren't listed at all.
>>
>> Fix it by using the same function which takes is_core into account,
>> except in the use_core_pmus block where it's always going to be true.
>> Before this change, -v would be required to get the same behavior for
>> core PMUs. Now it's no longer required:
>>
>> Before:
>> $ perf list | grep br_indirect_spec -A 1
>> br_indirect_spec
>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
>>
>> After:
>> $ perf list | grep br_indirect_spec -A 2
>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,
>> armv8_cortex_a57]
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> tools/perf/util/pmu.c | 5 +++--
>> tools/perf/util/pmu.h | 2 ++
>> tools/perf/util/pmus.c | 8 +++++---
>> 3 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 57450c73fb63..caff0d309012 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -834,9 +834,10 @@ static int is_sysfs_pmu_core(const char *name)
>> *
>> * @skip_duplicate_pmus: False in verbose mode so all uncore PMUs are visible
>> */
>> -static size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
>> - bool skip_duplicate_pmus)
>> +size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
>> + bool skip_duplicate_pmus)
>
> nit: I think the name should be perf_pmu__deduped_name_len for
> consistency with the other non-static functions.
>
Will change.
>> {
>> + name = name ?: "";
>
> nit: Should this just use pmu->name ?
>
I can keep this part at the callsite in
perf_pmus__scan_skip_duplicates() to avoid any confusion about this
function accessing pmu->name or not. The only reason this function takes
a separate name parameter is to allow it to work with struct
pmu_event_info elsewhere as well.
>> return skip_duplicate_pmus && !pmu->is_core
>> ? pmu_name_len_no_suffix(name)
>> : strlen(name);
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index b93014cc3670..ce6a394a695d 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -297,5 +297,7 @@ struct perf_pmu *perf_pmus__find_core_pmu(void);
>>
>> const char *perf_pmu__name_from_config(struct perf_pmu *pmu, u64 config);
>> bool perf_pmu__is_fake(const struct perf_pmu *pmu);
>> +size_t pmu_deduped_name_len(const struct perf_pmu *pmu, const char *name,
>> + bool skip_duplicate_pmus);
>>
>> #endif /* __PMU_H */
>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>> index cb1b14ade25b..1acc27af4d02 100644
>> --- a/tools/perf/util/pmus.c
>> +++ b/tools/perf/util/pmus.c
>> @@ -358,12 +358,14 @@ static struct perf_pmu *perf_pmus__scan_skip_duplicates(struct perf_pmu *pmu)
>> if (!pmu) {
>> pmu_read_sysfs(PERF_TOOL_PMU_TYPE_ALL_MASK);
>> pmu = list_prepare_entry(pmu, &core_pmus, list);
>> - } else
>> - last_pmu_name_len = pmu_name_len_no_suffix(pmu->name ?: "");
>> + } else {
>> + last_pmu_name_len = pmu_deduped_name_len(pmu, pmu->name,
>> + /*skip_duplicate_pmus=*/true);
>> + }
>>
>> if (use_core_pmus) {
>> list_for_each_entry_continue(pmu, &core_pmus, list) {
>> - int pmu_name_len = pmu_name_len_no_suffix(pmu->name ?: "");
>> + int pmu_name_len = strlen(pmu->name ?: "");
>>
>> if (last_pmu_name_len == pmu_name_len &&
>> !strncmp(last_pmu_name, pmu->name ?: "", pmu_name_len))
>
> Can this code be removed given there shouldn't be core PMUs with
> identical names? ie:
> ```
> if (use_core_pmus) {
> list_for_each_entry_continue(pmu, &core_pmus, list)
> return pmu;
>
> pmu = NULL;
> pmu = list_prepare_entry(pmu, &other_pmus, list);
> }
> ```
>
> Thanks,
> Ian
>
Even better, yes it probably can.
Thanks
James
>>
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-05 21:40 ` Ian Rogers
@ 2025-03-07 14:08 ` James Clark
2025-03-07 17:35 ` Ian Rogers
0 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2025-03-07 14:08 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On 05/03/2025 9:40 pm, Ian Rogers wrote:
> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>>
>> Instead of showing multiple line items with the same event name and
>> description, show a single line and concatenate all PMUs that this
>> event can belong to.
>>
>> Don't do it for json output. Machine readable output doesn't need to be
>> minimized, and changing the "Unit" field to a list type would break
>> backwards compatibility.
>>
>> Before:
>> $ perf list -v
>> ...
>> br_indirect_spec
>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
>> br_indirect_spec
>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
>>
>> After:
>>
>> $ perf list -v
>> ...
>> br_indirect_spec
>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> tools/perf/builtin-list.c | 2 ++
>> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
>> tools/perf/util/print-events.h | 1 +
>> 3 files changed, 70 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
>> index fed482adb039..aacd7beae2a0 100644
>> --- a/tools/perf/builtin-list.c
>> +++ b/tools/perf/builtin-list.c
>> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
>> .print_event = default_print_event,
>> .print_metric = default_print_metric,
>> .skip_duplicate_pmus = default_skip_duplicate_pmus,
>> + .collapse_events = true
>
> So collapse_events is put in the callbacks but isn't a callback. I
> think skipping duplicates and collapsing are the same thing, I'd
> prefer it if there weren't two terms for the same thing. If you prefer
> collapsing as a name then default_skip_duplicate_pmus can be
> default_collapse_pmus.
>
I can use the existing callback and rename it. That does have the effect
of disabling this behavior in verbose mode which may be useful or may be
unexpected to some people. But it seems pretty 50/50 so fewer callbacks
are probably better.
>> };
>> const char *cputype = NULL;
>> const char *unit_name = NULL;
>> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
>> .print_event = json_print_event,
>> .print_metric = json_print_metric,
>> .skip_duplicate_pmus = json_skip_duplicate_pmus,
>> + .collapse_events = false
>> };
>> ps = &json_ps;
>> } else {
>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>> index 4d60bac2d2b9..cb1b14ade25b 100644
>> --- a/tools/perf/util/pmus.c
>> +++ b/tools/perf/util/pmus.c
>> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
>> /* Order by PMU name. */
>> if (as->pmu == bs->pmu)
>> return 0;
>> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>> + if (ret)
>> + return ret;
>> +
>> + /* Order by remaining displayed fields for purposes of deduplication later */
>> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
>> + if (ret)
>> + return ret;
>> + ret = !!as->deprecated - !!bs->deprecated;
>> + if (ret)
>> + return ret;
>> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
>> + if (ret)
>> + return ret;
>> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
>> }
>>
>> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
>> +enum dup_type {
>> + UNIQUE,
>> + DUPLICATE,
>> + SAME_TEXT
>> +};
>> +
>> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
>> {
>> /* Different names -> never duplicates */
>> if (strcmp(a->name ?: "//", b->name ?: "//"))
>> - return false;
>> + return UNIQUE;
>> +
>> + /* Duplicate PMU name and event name -> hide completely */
>> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
>> + return DUPLICATE;
>> +
>> + /* Any other different display text -> not duplicate */
>> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
>> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
>> + a->deprecated != b->deprecated ||
>> + strcmp(a->desc ?: "", b->desc ?: "") ||
>> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
>> + return UNIQUE;
>> + }
>>
>> - /* Don't remove duplicates for different PMUs */
>> - return strcmp(a->pmu_name, b->pmu_name) == 0;
>> + /* Same display text but different PMU -> collapse */
>> + return SAME_TEXT;
>> }
>>
>> struct events_callback_state {
>> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
>> return 0;
>> }
>>
>> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
>> +{
>> + size_t len = strlen(pmu_names);
>> + size_t added;
>> +
>> + if (len)
>> + added = snprintf(pmu_names + len, size - len, ",%s", b);
>> + else
>> + added = snprintf(pmu_names, size, "%s,%s", a, b);
>> +
>> + /* Truncate with ... */
>> + if (added > 0 && added + len >= size)
>> + sprintf(pmu_names + size - 4, "...");
>> +}
>> +
>> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
>> {
>> struct perf_pmu *pmu;
>> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>> struct events_callback_state state;
>> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
>> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
>> + char pmu_names[128] = {0};
>>
>> if (skip_duplicate_pmus)
>> scan_fn = perf_pmus__scan_skip_duplicates;
>> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>> for (int j = 0; j < len; j++) {
>> /* Skip duplicates */
>> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
>> - goto free;
>> + if (j < len - 1) {
>> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
>> +
>> + if (dt == DUPLICATE) {
>> + goto free;
>> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
>> + concat_pmu_names(pmu_names, sizeof(pmu_names),
>> + aliases[j].pmu_name, aliases[j+1].pmu_name);
>> + goto free;
>> + }
>> + }
>
> I think a label called 'free' is a little unfortunate given the
> function called free.
> When I did things with sevent I was bringing over previous `perf list`
> code mainly so that the perf list output before and after the changes
> was identical. I wonder if this logic would be better placed in the
> callback like default_print_event which already maintains state
> (last_topic) to optionally print different things. This may better
> encapsulate the behavior rather than the logic being in the PMU code.
>
Yeah I can have a look at putting it in one of the callbacks. But in the
end builtin-list.c is the only user of perf_pmus__print_pmu_events()
anyway so makes you wonder if the whole thing can be moved to
builtin-list and open coded without the callbackyness.
>>
>> print_cb->print_event(print_state,
>> aliases[j].topic,
>> - aliases[j].pmu_name,
>> + pmu_names[0] ? pmu_names : aliases[j].pmu_name,
>> aliases[j].name,
>> aliases[j].alias,
>> aliases[j].scale_unit,
>> @@ -553,6 +611,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>> aliases[j].desc,
>> aliases[j].long_desc,
>> aliases[j].encoding_desc);
>
> The encoding_desc will have a PMU with the suffix removed as per
> skipping duplicates:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1954
> So I think something like:
> ```
> br_mis_pred_retired
> [Instruction architecturally executed,mispredicted branch. Unit:
> armv9_cortex_a510,armv9_cortex_a710]
> ```
> Would have an encoding of `armv9_cortex_a510/.../` without the a710
> encoding being present. That said, I'm not sure anyone cares :-)
>
> Thanks,
> Ian
>
Ah, in that case I'll disable it for --detailed as well as --json. I
could compare encoding_desc in pmu_alias_duplicate_type() but they'll
always be different because of the PMU name, so there's no point. And
displaying multiple encoding_descs would be fiddly too.
>> + pmu_names[0] = '\0';
>> free:
>> zfree(&aliases[j].name);
>> zfree(&aliases[j].alias);
>> diff --git a/tools/perf/util/print-events.h b/tools/perf/util/print-events.h
>> index 445efa1636c1..e91f9f830a2a 100644
>> --- a/tools/perf/util/print-events.h
>> +++ b/tools/perf/util/print-events.h
>> @@ -27,6 +27,7 @@ struct print_callbacks {
>> const char *threshold,
>> const char *unit);
>> bool (*skip_duplicate_pmus)(void *print_state);
>> + bool collapse_events;
>> };
>>
>> /** Print all events, the default when no options are specified. */
>>
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] perf list: Collapse similar events across PMUs
2025-03-05 20:38 ` Ian Rogers
@ 2025-03-07 14:47 ` James Clark
0 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2025-03-07 14:47 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On 05/03/2025 8:38 pm, Ian Rogers wrote:
> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>>
>> Some changes related to the discussion here [1] about deduplication of
>> PMUs. This change stands on its own right now, but it may not go far
>> enough. This can either be dropped for now or applied and improved
>> later. Extra ideas are as follows.
>>
>> Treating alphanumeric suffixes as duplicate has been slightly
>> problematic due to marketing strings having looks-like-but-not-actually
>> alphanumeric suffixes. For example 'cpum_cf' and now the one digit
>> longer than before 'cortex-a720'. The easy fix is to increase the
>> minimum length considered for deduplication as suggested [1], but as
>> also mentioned, the current mrvl_ddr_pmu PMU names don't zero pad the
>> address, meaning that > 2 alphanumeric suffixes are already technically
>> not enough to deduplicate the same PMUs. They could have only a 2 digit
>> alphanumeric address suffix. Increasing the minimum digits feels a bit
>> like kicking the can down the road and it places awkward limitations on
>> marketing names which we have no control over. Also I'm not sure helps
>> the following much:
>
> The hex suffix thing was very unfortunate, can we reverse that
> decision and move the physical address.. it captures into a caps file?
>
Do you mean changing the PMU's name in the driver? I can't imagine that
would go down too well given how long it's been there. Seems to me
there's too much risk of it breaking some other tool or script. It would
be a bit unfair to do that to fix an assumption that only really exists
in Perf, and especially if there could be other ways to achieve the same
behaviour in Perf.
> Fwiw, we have a similar hex mess with raw events. A raw event can be
> r0xead or read, but read also makes a pretty nice event name. This is
> solved by first checking if read is an event and if not assuming it is
> a raw encoding. Would anyone really care if raw events always had to
> start with r0x? We seem to get tangled up in corner cases like this
> too often, for example legacy event priorities and topdown event
> sorting. Keeping things simple would be nice from a correctness pov
> but also so that it's easy for other tools to emulate.
>
I think tools like this are always going to be full of various
heuristics and magic to make the experience as user friendly as
possible. If using the bare kernel interface was already fully featured
enough then there wouldn't be a need for Perf to exist.
In an ideal world we'd have some capability file like you mentioned
above. Or a folder with symlink to other instances of the same PMU then
a tool can follow the links to other PMUs to deduplicate them. In theory
we could add that right now. At least it wouldn't depend on any name
restrictions, but it does push complexity that should probably exist in
a tool into the kernel. In practice I think it would be a bit of a
nightmare because it has to be re-implemented for everyone's PMU driver.
That's got to be more code than tools doing it on the list of events
produced.
>> The problem is that arm_cmn_[n] PMUs have a numeric suffix, but they can
>> have different events. Even if we were adding this PMU today, keeping
>> the suffix rule in mind, it would be difficult to come up with a suffix
>> that differentiates the different ones. Flavour words might work, but
>> that complicates the kernel which would have to group them and come up
>> with flavours rather than just doing an i++. Deduplicating too
>> aggressively on only PMU name suffix means only arm_cmn_1's events get
>> listed, missing other events, and it's hard to see which events relate
>> to which PMU.
>>
>> Therefore in addition to the changes in this patchset I'd like to look
>> into:
>>
>> * Collapsing duplicate PMU names into ranges, for example
>> arm_pmu_v3_[0-4], rather than simply concatenating names as done in
>> this patchset
>
> The current suffix rule is weird as Intel's GPU i915 PMU appears to be
> a PMU called 'i' with a 915 numeric suffix.
> I think capturing the rules in the ABI doc and some kind of legacy
> compatibility is do-able.
> I like regular expressions over globs/fnmatch, for example, we could
> use flex to turn the CPUID matches in mapfile.csv into something that
> is parsed and matched with less interpretation/compilation at runtime.
> I suspect we can bike-shed for a long time on what these new rules
> will be, which has been why I've generally just tried to match
> existing and inelegant behaviors.
>
I think at this point we could even consider a whitelist of known PMU
names and how each one behaves. It's probably less than a handful long.
>> * Deduplicate uncore based on the contents of events/ rather than just
>> the suffix
>>
>> As some background, the original commit for deduplication, commit
>> 3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
>> mentions reducing the number of duplicate PMUs, and is presumably
>> motivated by usability. But there are also other commits mentioning
>> reducing openat()s, for example lazily loading formats 504026412162
>> ("perf pmu: Make the loading of formats lazy"). Deduplicating based on
>> the contents of the events/ folder is somewhat in contention with this
>> reduction, but could be done along side some more lazy loading (like of
>> the terms) and hashing the result of readdir() without opening any of
>> the contents. JSON tables can have event name hashes calculated at build
>> time if we want to consider them for deduplication too.
>
> I worry about the run time cost of this when there are 100s of uncore
I'm interested in quantifying this time if we do any of these changes.
Is this particular example of 100s of PMUs with mrvl_ddr_pmu? How many
events does it have? And which particular uses cases are most important,
is it just listing or opening events? From what I can see these
proposals shouldn't actually effect opening events, and you'd think
listing could take a small time hit as it's done less frequently than
opening. And there's already a lot of output.
> PMUs. I wonder if the case of an empty events directory would also be
> common.
Is this an issue in this case? If we're talking about listing events,
then we show the single deduplicated instance as it is now. With an
empty events folder nothing is lost.
> Wrt the x86 conventions I think Kan is best placed to say what the
> preferred convention should be.
> The json will routinely say this event is on a PMU name without a
> suffix, and the number of PMUs will only be known at runtime. Hashing
> without the suffix would be fine but I think the controversy is over
> what should be considered a suffix.
Well, I was considering what would happen if we always apply this
deduplication without taking suffixes into account. The suffix rule just
becomes a bit of a premature optimization to avoid reading the events
folder in some cases, but if we make it acceptably fast all PMUs
regardless of name could be considered for deduplication.
>
> Thanks for doing the series, I'll dig into/test each one in turn.
> Ian
>
>> Then with the events hash, PMU's can be sorted based on this and the
>> 'Unit:' string can be constructed with a set of values that collapses
>> adjacent suffixes to display as ranges. I believe that could remove the
>> need for any further changes to duplication based on suffix, but still
>> avoids over deduplication.
>>
>> [1]: https://lore.kernel.org/linux-perf-users/CAP-5=fW_Sq4iFxoWPWuixz9fMLBPyPUO0RG0KPbYa-5T0DZbTA@mail.gmail.com/
>>
>> ---
>> James Clark (3):
>> perf list: Order events by event name before PMU name
>> perf list: Collapse similar events across PMUs
>> perf list: Don't deduplicate core PMUs when listing events
>>
>> tools/perf/builtin-list.c | 2 +
>> tools/perf/util/pmu.c | 5 ++-
>> tools/perf/util/pmu.h | 2 +
>> tools/perf/util/pmus.c | 95 ++++++++++++++++++++++++++++++++++--------
>> tools/perf/util/print-events.h | 1 +
>> 5 files changed, 86 insertions(+), 19 deletions(-)
>> ---
>> base-commit: 7788ad59d1d9617792037a83513be5b1dd14150f
>> change-id: 20250304-james-perf-hybrid-list-11b888cf435a
>>
>> Best regards,
>> --
>> James Clark <james.clark@linaro.org>
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-07 14:08 ` James Clark
@ 2025-03-07 17:35 ` Ian Rogers
2025-03-10 10:04 ` James Clark
2025-03-11 14:13 ` James Clark
0 siblings, 2 replies; 19+ messages in thread
From: Ian Rogers @ 2025-03-07 17:35 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On Fri, Mar 7, 2025 at 6:08 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 05/03/2025 9:40 pm, Ian Rogers wrote:
> > On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> Instead of showing multiple line items with the same event name and
> >> description, show a single line and concatenate all PMUs that this
> >> event can belong to.
> >>
> >> Don't do it for json output. Machine readable output doesn't need to be
> >> minimized, and changing the "Unit" field to a list type would break
> >> backwards compatibility.
> >>
> >> Before:
> >> $ perf list -v
> >> ...
> >> br_indirect_spec
> >> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
> >> br_indirect_spec
> >> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
> >>
> >> After:
> >>
> >> $ perf list -v
> >> ...
> >> br_indirect_spec
> >> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
> >>
> >> Signed-off-by: James Clark <james.clark@linaro.org>
> >> ---
> >> tools/perf/builtin-list.c | 2 ++
> >> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
> >> tools/perf/util/print-events.h | 1 +
> >> 3 files changed, 70 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
> >> index fed482adb039..aacd7beae2a0 100644
> >> --- a/tools/perf/builtin-list.c
> >> +++ b/tools/perf/builtin-list.c
> >> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
> >> .print_event = default_print_event,
> >> .print_metric = default_print_metric,
> >> .skip_duplicate_pmus = default_skip_duplicate_pmus,
> >> + .collapse_events = true
> >
> > So collapse_events is put in the callbacks but isn't a callback. I
> > think skipping duplicates and collapsing are the same thing, I'd
> > prefer it if there weren't two terms for the same thing. If you prefer
> > collapsing as a name then default_skip_duplicate_pmus can be
> > default_collapse_pmus.
> >
>
> I can use the existing callback and rename it. That does have the effect
> of disabling this behavior in verbose mode which may be useful or may be
> unexpected to some people. But it seems pretty 50/50 so fewer callbacks
> are probably better.
>
> >> };
> >> const char *cputype = NULL;
> >> const char *unit_name = NULL;
> >> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
> >> .print_event = json_print_event,
> >> .print_metric = json_print_metric,
> >> .skip_duplicate_pmus = json_skip_duplicate_pmus,
> >> + .collapse_events = false
> >> };
> >> ps = &json_ps;
> >> } else {
> >> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> >> index 4d60bac2d2b9..cb1b14ade25b 100644
> >> --- a/tools/perf/util/pmus.c
> >> +++ b/tools/perf/util/pmus.c
> >> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
> >> /* Order by PMU name. */
> >> if (as->pmu == bs->pmu)
> >> return 0;
> >> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> >> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> >> + if (ret)
> >> + return ret;
> >> +
> >> + /* Order by remaining displayed fields for purposes of deduplication later */
> >> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
> >> + if (ret)
> >> + return ret;
> >> + ret = !!as->deprecated - !!bs->deprecated;
> >> + if (ret)
> >> + return ret;
> >> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
> >> + if (ret)
> >> + return ret;
> >> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
> >> }
> >>
> >> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
> >> +enum dup_type {
> >> + UNIQUE,
> >> + DUPLICATE,
> >> + SAME_TEXT
> >> +};
> >> +
> >> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
> >> {
> >> /* Different names -> never duplicates */
> >> if (strcmp(a->name ?: "//", b->name ?: "//"))
> >> - return false;
> >> + return UNIQUE;
> >> +
> >> + /* Duplicate PMU name and event name -> hide completely */
> >> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
> >> + return DUPLICATE;
> >> +
> >> + /* Any other different display text -> not duplicate */
> >> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
> >> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
> >> + a->deprecated != b->deprecated ||
> >> + strcmp(a->desc ?: "", b->desc ?: "") ||
> >> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
> >> + return UNIQUE;
> >> + }
> >>
> >> - /* Don't remove duplicates for different PMUs */
> >> - return strcmp(a->pmu_name, b->pmu_name) == 0;
> >> + /* Same display text but different PMU -> collapse */
> >> + return SAME_TEXT;
> >> }
> >>
> >> struct events_callback_state {
> >> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
> >> return 0;
> >> }
> >>
> >> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
> >> +{
> >> + size_t len = strlen(pmu_names);
> >> + size_t added;
> >> +
> >> + if (len)
> >> + added = snprintf(pmu_names + len, size - len, ",%s", b);
> >> + else
> >> + added = snprintf(pmu_names, size, "%s,%s", a, b);
> >> +
> >> + /* Truncate with ... */
> >> + if (added > 0 && added + len >= size)
> >> + sprintf(pmu_names + size - 4, "...");
> >> +}
> >> +
> >> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
> >> {
> >> struct perf_pmu *pmu;
> >> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> >> struct events_callback_state state;
> >> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
> >> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
> >> + char pmu_names[128] = {0};
> >>
> >> if (skip_duplicate_pmus)
> >> scan_fn = perf_pmus__scan_skip_duplicates;
> >> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> >> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
> >> for (int j = 0; j < len; j++) {
> >> /* Skip duplicates */
> >> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
> >> - goto free;
> >> + if (j < len - 1) {
> >> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
> >> +
> >> + if (dt == DUPLICATE) {
> >> + goto free;
> >> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
> >> + concat_pmu_names(pmu_names, sizeof(pmu_names),
> >> + aliases[j].pmu_name, aliases[j+1].pmu_name);
> >> + goto free;
> >> + }
> >> + }
> >
> > I think a label called 'free' is a little unfortunate given the
> > function called free.
> > When I did things with sevent I was bringing over previous `perf list`
> > code mainly so that the perf list output before and after the changes
> > was identical. I wonder if this logic would be better placed in the
> > callback like default_print_event which already maintains state
> > (last_topic) to optionally print different things. This may better
> > encapsulate the behavior rather than the logic being in the PMU code.
> >
>
> Yeah I can have a look at putting it in one of the callbacks. But in the
> end builtin-list.c is the only user of perf_pmus__print_pmu_events()
> anyway so makes you wonder if the whole thing can be moved to
> builtin-list and open coded without the callbackyness.
I wanted to hide some of the innards of pmus, so I think that's the
reason it is as it is. The whole `sevent` was pre-existing and
maintained so that the output was the same.
> >>
> >> print_cb->print_event(print_state,
> >> aliases[j].topic,
> >> - aliases[j].pmu_name,
> >> + pmu_names[0] ? pmu_names : aliases[j].pmu_name,
> >> aliases[j].name,
> >> aliases[j].alias,
> >> aliases[j].scale_unit,
> >> @@ -553,6 +611,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> >> aliases[j].desc,
> >> aliases[j].long_desc,
> >> aliases[j].encoding_desc);
> >
> > The encoding_desc will have a PMU with the suffix removed as per
> > skipping duplicates:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1954
> > So I think something like:
> > ```
> > br_mis_pred_retired
> > [Instruction architecturally executed,mispredicted branch. Unit:
> > armv9_cortex_a510,armv9_cortex_a710]
> > ```
> > Would have an encoding of `armv9_cortex_a510/.../` without the a710
> > encoding being present. That said, I'm not sure anyone cares :-)
> >
> > Thanks,
> > Ian
> >
>
> Ah, in that case I'll disable it for --detailed as well as --json. I
> could compare encoding_desc in pmu_alias_duplicate_type() but they'll
> always be different because of the PMU name, so there's no point. And
> displaying multiple encoding_descs would be fiddly too.
So I'd like not to have the encoding_desc removed. It is useful for
debugging.. I meant by people not caring that the format of that
string is under specified, so the PMU name not being 100% accurate
likely doesn't affect people.
Thanks,
Ian
> >> + pmu_names[0] = '\0';
> >> free:
> >> zfree(&aliases[j].name);
> >> zfree(&aliases[j].alias);
> >> diff --git a/tools/perf/util/print-events.h b/tools/perf/util/print-events.h
> >> index 445efa1636c1..e91f9f830a2a 100644
> >> --- a/tools/perf/util/print-events.h
> >> +++ b/tools/perf/util/print-events.h
> >> @@ -27,6 +27,7 @@ struct print_callbacks {
> >> const char *threshold,
> >> const char *unit);
> >> bool (*skip_duplicate_pmus)(void *print_state);
> >> + bool collapse_events;
> >> };
> >>
> >> /** Print all events, the default when no options are specified. */
> >>
> >> --
> >> 2.34.1
> >>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-07 17:35 ` Ian Rogers
@ 2025-03-10 10:04 ` James Clark
2025-03-11 14:13 ` James Clark
1 sibling, 0 replies; 19+ messages in thread
From: James Clark @ 2025-03-10 10:04 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On 07/03/2025 5:35 pm, Ian Rogers wrote:
> On Fri, Mar 7, 2025 at 6:08 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 05/03/2025 9:40 pm, Ian Rogers wrote:
>>> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> Instead of showing multiple line items with the same event name and
>>>> description, show a single line and concatenate all PMUs that this
>>>> event can belong to.
>>>>
>>>> Don't do it for json output. Machine readable output doesn't need to be
>>>> minimized, and changing the "Unit" field to a list type would break
>>>> backwards compatibility.
>>>>
>>>> Before:
>>>> $ perf list -v
>>>> ...
>>>> br_indirect_spec
>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
>>>> br_indirect_spec
>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
>>>>
>>>> After:
>>>>
>>>> $ perf list -v
>>>> ...
>>>> br_indirect_spec
>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>> tools/perf/builtin-list.c | 2 ++
>>>> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
>>>> tools/perf/util/print-events.h | 1 +
>>>> 3 files changed, 70 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
>>>> index fed482adb039..aacd7beae2a0 100644
>>>> --- a/tools/perf/builtin-list.c
>>>> +++ b/tools/perf/builtin-list.c
>>>> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
>>>> .print_event = default_print_event,
>>>> .print_metric = default_print_metric,
>>>> .skip_duplicate_pmus = default_skip_duplicate_pmus,
>>>> + .collapse_events = true
>>>
>>> So collapse_events is put in the callbacks but isn't a callback. I
>>> think skipping duplicates and collapsing are the same thing, I'd
>>> prefer it if there weren't two terms for the same thing. If you prefer
>>> collapsing as a name then default_skip_duplicate_pmus can be
>>> default_collapse_pmus.
>>>
>>
>> I can use the existing callback and rename it. That does have the effect
>> of disabling this behavior in verbose mode which may be useful or may be
>> unexpected to some people. But it seems pretty 50/50 so fewer callbacks
>> are probably better.
>>
>>>> };
>>>> const char *cputype = NULL;
>>>> const char *unit_name = NULL;
>>>> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
>>>> .print_event = json_print_event,
>>>> .print_metric = json_print_metric,
>>>> .skip_duplicate_pmus = json_skip_duplicate_pmus,
>>>> + .collapse_events = false
>>>> };
>>>> ps = &json_ps;
>>>> } else {
>>>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>>>> index 4d60bac2d2b9..cb1b14ade25b 100644
>>>> --- a/tools/perf/util/pmus.c
>>>> +++ b/tools/perf/util/pmus.c
>>>> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
>>>> /* Order by PMU name. */
>>>> if (as->pmu == bs->pmu)
>>>> return 0;
>>>> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>>>> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + /* Order by remaining displayed fields for purposes of deduplication later */
>>>> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
>>>> + if (ret)
>>>> + return ret;
>>>> + ret = !!as->deprecated - !!bs->deprecated;
>>>> + if (ret)
>>>> + return ret;
>>>> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
>>>> + if (ret)
>>>> + return ret;
>>>> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
>>>> }
>>>>
>>>> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
>>>> +enum dup_type {
>>>> + UNIQUE,
>>>> + DUPLICATE,
>>>> + SAME_TEXT
>>>> +};
>>>> +
>>>> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
>>>> {
>>>> /* Different names -> never duplicates */
>>>> if (strcmp(a->name ?: "//", b->name ?: "//"))
>>>> - return false;
>>>> + return UNIQUE;
>>>> +
>>>> + /* Duplicate PMU name and event name -> hide completely */
>>>> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
>>>> + return DUPLICATE;
>>>> +
>>>> + /* Any other different display text -> not duplicate */
>>>> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
>>>> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
>>>> + a->deprecated != b->deprecated ||
>>>> + strcmp(a->desc ?: "", b->desc ?: "") ||
>>>> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
>>>> + return UNIQUE;
>>>> + }
>>>>
>>>> - /* Don't remove duplicates for different PMUs */
>>>> - return strcmp(a->pmu_name, b->pmu_name) == 0;
>>>> + /* Same display text but different PMU -> collapse */
>>>> + return SAME_TEXT;
>>>> }
>>>>
>>>> struct events_callback_state {
>>>> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
>>>> return 0;
>>>> }
>>>>
>>>> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
>>>> +{
>>>> + size_t len = strlen(pmu_names);
>>>> + size_t added;
>>>> +
>>>> + if (len)
>>>> + added = snprintf(pmu_names + len, size - len, ",%s", b);
>>>> + else
>>>> + added = snprintf(pmu_names, size, "%s,%s", a, b);
>>>> +
>>>> + /* Truncate with ... */
>>>> + if (added > 0 && added + len >= size)
>>>> + sprintf(pmu_names + size - 4, "...");
>>>> +}
>>>> +
>>>> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
>>>> {
>>>> struct perf_pmu *pmu;
>>>> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>> struct events_callback_state state;
>>>> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
>>>> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
>>>> + char pmu_names[128] = {0};
>>>>
>>>> if (skip_duplicate_pmus)
>>>> scan_fn = perf_pmus__scan_skip_duplicates;
>>>> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>>>> for (int j = 0; j < len; j++) {
>>>> /* Skip duplicates */
>>>> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
>>>> - goto free;
>>>> + if (j < len - 1) {
>>>> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
>>>> +
>>>> + if (dt == DUPLICATE) {
>>>> + goto free;
>>>> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
>>>> + concat_pmu_names(pmu_names, sizeof(pmu_names),
>>>> + aliases[j].pmu_name, aliases[j+1].pmu_name);
>>>> + goto free;
>>>> + }
>>>> + }
>>>
>>> I think a label called 'free' is a little unfortunate given the
>>> function called free.
>>> When I did things with sevent I was bringing over previous `perf list`
>>> code mainly so that the perf list output before and after the changes
>>> was identical. I wonder if this logic would be better placed in the
>>> callback like default_print_event which already maintains state
>>> (last_topic) to optionally print different things. This may better
>>> encapsulate the behavior rather than the logic being in the PMU code.
>>>
>>
>> Yeah I can have a look at putting it in one of the callbacks. But in the
>> end builtin-list.c is the only user of perf_pmus__print_pmu_events()
>> anyway so makes you wonder if the whole thing can be moved to
>> builtin-list and open coded without the callbackyness.
>
> I wanted to hide some of the innards of pmus, so I think that's the
> reason it is as it is. The whole `sevent` was pre-existing and
> maintained so that the output was the same.
>
>>>>
>>>> print_cb->print_event(print_state,
>>>> aliases[j].topic,
>>>> - aliases[j].pmu_name,
>>>> + pmu_names[0] ? pmu_names : aliases[j].pmu_name,
>>>> aliases[j].name,
>>>> aliases[j].alias,
>>>> aliases[j].scale_unit,
>>>> @@ -553,6 +611,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>> aliases[j].desc,
>>>> aliases[j].long_desc,
>>>> aliases[j].encoding_desc);
>>>
>>> The encoding_desc will have a PMU with the suffix removed as per
>>> skipping duplicates:
>>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1954
>>> So I think something like:
>>> ```
>>> br_mis_pred_retired
>>> [Instruction architecturally executed,mispredicted branch. Unit:
>>> armv9_cortex_a510,armv9_cortex_a710]
>>> ```
>>> Would have an encoding of `armv9_cortex_a510/.../` without the a710
>>> encoding being present. That said, I'm not sure anyone cares :-)
>>>
>>> Thanks,
>>> Ian
>>>
>>
>> Ah, in that case I'll disable it for --detailed as well as --json. I
>> could compare encoding_desc in pmu_alias_duplicate_type() but they'll
>> always be different because of the PMU name, so there's no point. And
>> displaying multiple encoding_descs would be fiddly too.
>
> So I'd like not to have the encoding_desc removed. It is useful for
> debugging.. I meant by people not caring that the format of that
> string is under specified, so the PMU name not being 100% accurate
> likely doesn't affect people.
>
> Thanks,
> Ian
>
>
By 'disabled' I meant disable this collapsing of PMU names, so with
--detailed you'll get the existing full list of events with their
encoding_descs, no change there. Sorry for the confusion.
>>>> + pmu_names[0] = '\0';
>>>> free:
>>>> zfree(&aliases[j].name);
>>>> zfree(&aliases[j].alias);
>>>> diff --git a/tools/perf/util/print-events.h b/tools/perf/util/print-events.h
>>>> index 445efa1636c1..e91f9f830a2a 100644
>>>> --- a/tools/perf/util/print-events.h
>>>> +++ b/tools/perf/util/print-events.h
>>>> @@ -27,6 +27,7 @@ struct print_callbacks {
>>>> const char *threshold,
>>>> const char *unit);
>>>> bool (*skip_duplicate_pmus)(void *print_state);
>>>> + bool collapse_events;
>>>> };
>>>>
>>>> /** Print all events, the default when no options are specified. */
>>>>
>>>> --
>>>> 2.34.1
>>>>
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-07 17:35 ` Ian Rogers
2025-03-10 10:04 ` James Clark
@ 2025-03-11 14:13 ` James Clark
2025-03-11 15:14 ` Ian Rogers
1 sibling, 1 reply; 19+ messages in thread
From: James Clark @ 2025-03-11 14:13 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On 07/03/2025 5:35 pm, Ian Rogers wrote:
> On Fri, Mar 7, 2025 at 6:08 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 05/03/2025 9:40 pm, Ian Rogers wrote:
>>> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> Instead of showing multiple line items with the same event name and
>>>> description, show a single line and concatenate all PMUs that this
>>>> event can belong to.
>>>>
>>>> Don't do it for json output. Machine readable output doesn't need to be
>>>> minimized, and changing the "Unit" field to a list type would break
>>>> backwards compatibility.
>>>>
>>>> Before:
>>>> $ perf list -v
>>>> ...
>>>> br_indirect_spec
>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
>>>> br_indirect_spec
>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
>>>>
>>>> After:
>>>>
>>>> $ perf list -v
>>>> ...
>>>> br_indirect_spec
>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>> tools/perf/builtin-list.c | 2 ++
>>>> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
>>>> tools/perf/util/print-events.h | 1 +
>>>> 3 files changed, 70 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
>>>> index fed482adb039..aacd7beae2a0 100644
>>>> --- a/tools/perf/builtin-list.c
>>>> +++ b/tools/perf/builtin-list.c
>>>> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
>>>> .print_event = default_print_event,
>>>> .print_metric = default_print_metric,
>>>> .skip_duplicate_pmus = default_skip_duplicate_pmus,
>>>> + .collapse_events = true
>>>
>>> So collapse_events is put in the callbacks but isn't a callback. I
>>> think skipping duplicates and collapsing are the same thing, I'd
>>> prefer it if there weren't two terms for the same thing. If you prefer
>>> collapsing as a name then default_skip_duplicate_pmus can be
>>> default_collapse_pmus.
>>>
>>
>> I can use the existing callback and rename it. That does have the effect
>> of disabling this behavior in verbose mode which may be useful or may be
>> unexpected to some people. But it seems pretty 50/50 so fewer callbacks
>> are probably better.
>>
>>>> };
>>>> const char *cputype = NULL;
>>>> const char *unit_name = NULL;
>>>> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
>>>> .print_event = json_print_event,
>>>> .print_metric = json_print_metric,
>>>> .skip_duplicate_pmus = json_skip_duplicate_pmus,
>>>> + .collapse_events = false
>>>> };
>>>> ps = &json_ps;
>>>> } else {
>>>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>>>> index 4d60bac2d2b9..cb1b14ade25b 100644
>>>> --- a/tools/perf/util/pmus.c
>>>> +++ b/tools/perf/util/pmus.c
>>>> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
>>>> /* Order by PMU name. */
>>>> if (as->pmu == bs->pmu)
>>>> return 0;
>>>> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>>>> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + /* Order by remaining displayed fields for purposes of deduplication later */
>>>> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
>>>> + if (ret)
>>>> + return ret;
>>>> + ret = !!as->deprecated - !!bs->deprecated;
>>>> + if (ret)
>>>> + return ret;
>>>> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
>>>> + if (ret)
>>>> + return ret;
>>>> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
>>>> }
>>>>
>>>> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
>>>> +enum dup_type {
>>>> + UNIQUE,
>>>> + DUPLICATE,
>>>> + SAME_TEXT
>>>> +};
>>>> +
>>>> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
>>>> {
>>>> /* Different names -> never duplicates */
>>>> if (strcmp(a->name ?: "//", b->name ?: "//"))
>>>> - return false;
>>>> + return UNIQUE;
>>>> +
>>>> + /* Duplicate PMU name and event name -> hide completely */
>>>> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
>>>> + return DUPLICATE;
>>>> +
>>>> + /* Any other different display text -> not duplicate */
>>>> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
>>>> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
>>>> + a->deprecated != b->deprecated ||
>>>> + strcmp(a->desc ?: "", b->desc ?: "") ||
>>>> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
>>>> + return UNIQUE;
>>>> + }
>>>>
>>>> - /* Don't remove duplicates for different PMUs */
>>>> - return strcmp(a->pmu_name, b->pmu_name) == 0;
>>>> + /* Same display text but different PMU -> collapse */
>>>> + return SAME_TEXT;
>>>> }
>>>>
>>>> struct events_callback_state {
>>>> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
>>>> return 0;
>>>> }
>>>>
>>>> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
>>>> +{
>>>> + size_t len = strlen(pmu_names);
>>>> + size_t added;
>>>> +
>>>> + if (len)
>>>> + added = snprintf(pmu_names + len, size - len, ",%s", b);
>>>> + else
>>>> + added = snprintf(pmu_names, size, "%s,%s", a, b);
>>>> +
>>>> + /* Truncate with ... */
>>>> + if (added > 0 && added + len >= size)
>>>> + sprintf(pmu_names + size - 4, "...");
>>>> +}
>>>> +
>>>> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
>>>> {
>>>> struct perf_pmu *pmu;
>>>> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>> struct events_callback_state state;
>>>> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
>>>> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
>>>> + char pmu_names[128] = {0};
>>>>
>>>> if (skip_duplicate_pmus)
>>>> scan_fn = perf_pmus__scan_skip_duplicates;
>>>> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>>>> for (int j = 0; j < len; j++) {
>>>> /* Skip duplicates */
>>>> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
>>>> - goto free;
>>>> + if (j < len - 1) {
>>>> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
>>>> +
>>>> + if (dt == DUPLICATE) {
>>>> + goto free;
>>>> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
>>>> + concat_pmu_names(pmu_names, sizeof(pmu_names),
>>>> + aliases[j].pmu_name, aliases[j+1].pmu_name);
>>>> + goto free;
>>>> + }
>>>> + }
>>>
>>> I think a label called 'free' is a little unfortunate given the
>>> function called free.
>>> When I did things with sevent I was bringing over previous `perf list`
>>> code mainly so that the perf list output before and after the changes
>>> was identical. I wonder if this logic would be better placed in the
>>> callback like default_print_event which already maintains state
>>> (last_topic) to optionally print different things. This may better
>>> encapsulate the behavior rather than the logic being in the PMU code.
>>>
>>
>> Yeah I can have a look at putting it in one of the callbacks. But in the
>> end builtin-list.c is the only user of perf_pmus__print_pmu_events()
>> anyway so makes you wonder if the whole thing can be moved to
>> builtin-list and open coded without the callbackyness.
>
> I wanted to hide some of the innards of pmus, so I think that's the
> reason it is as it is. The whole `sevent` was pre-existing and
> maintained so that the output was the same.
>
Looking at this a bit more it is possible to move all of
perf_pmus__print_pmu_events() (including its dependencies and perf list
specifics) out of pmus.c into print-events.c without exposing any new
innards of pmus. Only struct pmu_event_info would be used, but that's
not private and is already used elsewhere.
It's difficult to do this change only in the print_event callback
because it relies on having the _next_ event, not the previous event.
We're already tracking last_topic, and if we start passing all the
next_* items too it gets a bit unmanageable.
If it's all moved then doing display specific processing across the
whole list becomes quite easy.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-11 14:13 ` James Clark
@ 2025-03-11 15:14 ` Ian Rogers
2025-03-11 16:41 ` James Clark
0 siblings, 1 reply; 19+ messages in thread
From: Ian Rogers @ 2025-03-11 15:14 UTC (permalink / raw)
To: James Clark
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On Tue, Mar 11, 2025 at 7:13 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 07/03/2025 5:35 pm, Ian Rogers wrote:
> > On Fri, Mar 7, 2025 at 6:08 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 05/03/2025 9:40 pm, Ian Rogers wrote:
> >>> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
> >>>>
> >>>> Instead of showing multiple line items with the same event name and
> >>>> description, show a single line and concatenate all PMUs that this
> >>>> event can belong to.
> >>>>
> >>>> Don't do it for json output. Machine readable output doesn't need to be
> >>>> minimized, and changing the "Unit" field to a list type would break
> >>>> backwards compatibility.
> >>>>
> >>>> Before:
> >>>> $ perf list -v
> >>>> ...
> >>>> br_indirect_spec
> >>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
> >>>> br_indirect_spec
> >>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
> >>>>
> >>>> After:
> >>>>
> >>>> $ perf list -v
> >>>> ...
> >>>> br_indirect_spec
> >>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
> >>>>
> >>>> Signed-off-by: James Clark <james.clark@linaro.org>
> >>>> ---
> >>>> tools/perf/builtin-list.c | 2 ++
> >>>> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
> >>>> tools/perf/util/print-events.h | 1 +
> >>>> 3 files changed, 70 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
> >>>> index fed482adb039..aacd7beae2a0 100644
> >>>> --- a/tools/perf/builtin-list.c
> >>>> +++ b/tools/perf/builtin-list.c
> >>>> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
> >>>> .print_event = default_print_event,
> >>>> .print_metric = default_print_metric,
> >>>> .skip_duplicate_pmus = default_skip_duplicate_pmus,
> >>>> + .collapse_events = true
> >>>
> >>> So collapse_events is put in the callbacks but isn't a callback. I
> >>> think skipping duplicates and collapsing are the same thing, I'd
> >>> prefer it if there weren't two terms for the same thing. If you prefer
> >>> collapsing as a name then default_skip_duplicate_pmus can be
> >>> default_collapse_pmus.
> >>>
> >>
> >> I can use the existing callback and rename it. That does have the effect
> >> of disabling this behavior in verbose mode which may be useful or may be
> >> unexpected to some people. But it seems pretty 50/50 so fewer callbacks
> >> are probably better.
> >>
> >>>> };
> >>>> const char *cputype = NULL;
> >>>> const char *unit_name = NULL;
> >>>> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
> >>>> .print_event = json_print_event,
> >>>> .print_metric = json_print_metric,
> >>>> .skip_duplicate_pmus = json_skip_duplicate_pmus,
> >>>> + .collapse_events = false
> >>>> };
> >>>> ps = &json_ps;
> >>>> } else {
> >>>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> >>>> index 4d60bac2d2b9..cb1b14ade25b 100644
> >>>> --- a/tools/perf/util/pmus.c
> >>>> +++ b/tools/perf/util/pmus.c
> >>>> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
> >>>> /* Order by PMU name. */
> >>>> if (as->pmu == bs->pmu)
> >>>> return 0;
> >>>> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> >>>> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
> >>>> + if (ret)
> >>>> + return ret;
> >>>> +
> >>>> + /* Order by remaining displayed fields for purposes of deduplication later */
> >>>> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
> >>>> + if (ret)
> >>>> + return ret;
> >>>> + ret = !!as->deprecated - !!bs->deprecated;
> >>>> + if (ret)
> >>>> + return ret;
> >>>> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
> >>>> + if (ret)
> >>>> + return ret;
> >>>> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
> >>>> }
> >>>>
> >>>> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
> >>>> +enum dup_type {
> >>>> + UNIQUE,
> >>>> + DUPLICATE,
> >>>> + SAME_TEXT
> >>>> +};
> >>>> +
> >>>> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
> >>>> {
> >>>> /* Different names -> never duplicates */
> >>>> if (strcmp(a->name ?: "//", b->name ?: "//"))
> >>>> - return false;
> >>>> + return UNIQUE;
> >>>> +
> >>>> + /* Duplicate PMU name and event name -> hide completely */
> >>>> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
> >>>> + return DUPLICATE;
> >>>> +
> >>>> + /* Any other different display text -> not duplicate */
> >>>> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
> >>>> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
> >>>> + a->deprecated != b->deprecated ||
> >>>> + strcmp(a->desc ?: "", b->desc ?: "") ||
> >>>> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
> >>>> + return UNIQUE;
> >>>> + }
> >>>>
> >>>> - /* Don't remove duplicates for different PMUs */
> >>>> - return strcmp(a->pmu_name, b->pmu_name) == 0;
> >>>> + /* Same display text but different PMU -> collapse */
> >>>> + return SAME_TEXT;
> >>>> }
> >>>>
> >>>> struct events_callback_state {
> >>>> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
> >>>> return 0;
> >>>> }
> >>>>
> >>>> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
> >>>> +{
> >>>> + size_t len = strlen(pmu_names);
> >>>> + size_t added;
> >>>> +
> >>>> + if (len)
> >>>> + added = snprintf(pmu_names + len, size - len, ",%s", b);
> >>>> + else
> >>>> + added = snprintf(pmu_names, size, "%s,%s", a, b);
> >>>> +
> >>>> + /* Truncate with ... */
> >>>> + if (added > 0 && added + len >= size)
> >>>> + sprintf(pmu_names + size - 4, "...");
> >>>> +}
> >>>> +
> >>>> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
> >>>> {
> >>>> struct perf_pmu *pmu;
> >>>> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> >>>> struct events_callback_state state;
> >>>> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
> >>>> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
> >>>> + char pmu_names[128] = {0};
> >>>>
> >>>> if (skip_duplicate_pmus)
> >>>> scan_fn = perf_pmus__scan_skip_duplicates;
> >>>> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
> >>>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
> >>>> for (int j = 0; j < len; j++) {
> >>>> /* Skip duplicates */
> >>>> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
> >>>> - goto free;
> >>>> + if (j < len - 1) {
> >>>> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
> >>>> +
> >>>> + if (dt == DUPLICATE) {
> >>>> + goto free;
> >>>> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
> >>>> + concat_pmu_names(pmu_names, sizeof(pmu_names),
> >>>> + aliases[j].pmu_name, aliases[j+1].pmu_name);
> >>>> + goto free;
> >>>> + }
> >>>> + }
> >>>
> >>> I think a label called 'free' is a little unfortunate given the
> >>> function called free.
> >>> When I did things with sevent I was bringing over previous `perf list`
> >>> code mainly so that the perf list output before and after the changes
> >>> was identical. I wonder if this logic would be better placed in the
> >>> callback like default_print_event which already maintains state
> >>> (last_topic) to optionally print different things. This may better
> >>> encapsulate the behavior rather than the logic being in the PMU code.
> >>>
> >>
> >> Yeah I can have a look at putting it in one of the callbacks. But in the
> >> end builtin-list.c is the only user of perf_pmus__print_pmu_events()
> >> anyway so makes you wonder if the whole thing can be moved to
> >> builtin-list and open coded without the callbackyness.
> >
> > I wanted to hide some of the innards of pmus, so I think that's the
> > reason it is as it is. The whole `sevent` was pre-existing and
> > maintained so that the output was the same.
> >
>
> Looking at this a bit more it is possible to move all of
> perf_pmus__print_pmu_events() (including its dependencies and perf list
> specifics) out of pmus.c into print-events.c without exposing any new
> innards of pmus. Only struct pmu_event_info would be used, but that's
> not private and is already used elsewhere.
>
> It's difficult to do this change only in the print_event callback
> because it relies on having the _next_ event, not the previous event.
> We're already tracking last_topic, and if we start passing all the
> next_* items too it gets a bit unmanageable.
>
> If it's all moved then doing display specific processing across the
> whole list becomes quite easy.
I'm not sure I follow all of this. If things can be moved to a more
obvious location, printing code in print-events.c, and we maintain
encapsulation then that sounds great to me. I'm not clear on the next
event issue. My hope with the print routines in builtin-list.c was
that anyone could come along and add another for whatever rendering
took their fancy. I didn't want it to be like the logic in
stat-display.c, where there are multiple levels of call backs, global
state, odd things like passing NULL meaning display column headers,
and the whole thing being a confusing rats nest where a change nearly
always ripples through and breaks something somewhere. Particularly
compare the json output in stat-display.c and builtin-list.c, my hope
was that builtin-list.c would be the model for reimplementing the
stat-display.c one. Others may have different opinions.
Thanks,
Ian
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] perf list: Collapse similar events across PMUs
2025-03-11 15:14 ` Ian Rogers
@ 2025-03-11 16:41 ` James Clark
0 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2025-03-11 16:41 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Robin Murphy, Leo Yan, linux-perf-users,
linux-kernel
On 11/03/2025 3:14 pm, Ian Rogers wrote:
> On Tue, Mar 11, 2025 at 7:13 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 07/03/2025 5:35 pm, Ian Rogers wrote:
>>> On Fri, Mar 7, 2025 at 6:08 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 05/03/2025 9:40 pm, Ian Rogers wrote:
>>>>> On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@linaro.org> wrote:
>>>>>>
>>>>>> Instead of showing multiple line items with the same event name and
>>>>>> description, show a single line and concatenate all PMUs that this
>>>>>> event can belong to.
>>>>>>
>>>>>> Don't do it for json output. Machine readable output doesn't need to be
>>>>>> minimized, and changing the "Unit" field to a list type would break
>>>>>> backwards compatibility.
>>>>>>
>>>>>> Before:
>>>>>> $ perf list -v
>>>>>> ...
>>>>>> br_indirect_spec
>>>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53]
>>>>>> br_indirect_spec
>>>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a57]
>>>>>>
>>>>>> After:
>>>>>>
>>>>>> $ perf list -v
>>>>>> ...
>>>>>> br_indirect_spec
>>>>>> [Branch speculatively executed,indirect branch. Unit: armv8_cortex_a53,armv8_cortex_a57]
>>>>>>
>>>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>>>> ---
>>>>>> tools/perf/builtin-list.c | 2 ++
>>>>>> tools/perf/util/pmus.c | 75 +++++++++++++++++++++++++++++++++++++-----
>>>>>> tools/perf/util/print-events.h | 1 +
>>>>>> 3 files changed, 70 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
>>>>>> index fed482adb039..aacd7beae2a0 100644
>>>>>> --- a/tools/perf/builtin-list.c
>>>>>> +++ b/tools/perf/builtin-list.c
>>>>>> @@ -516,6 +516,7 @@ int cmd_list(int argc, const char **argv)
>>>>>> .print_event = default_print_event,
>>>>>> .print_metric = default_print_metric,
>>>>>> .skip_duplicate_pmus = default_skip_duplicate_pmus,
>>>>>> + .collapse_events = true
>>>>>
>>>>> So collapse_events is put in the callbacks but isn't a callback. I
>>>>> think skipping duplicates and collapsing are the same thing, I'd
>>>>> prefer it if there weren't two terms for the same thing. If you prefer
>>>>> collapsing as a name then default_skip_duplicate_pmus can be
>>>>> default_collapse_pmus.
>>>>>
>>>>
>>>> I can use the existing callback and rename it. That does have the effect
>>>> of disabling this behavior in verbose mode which may be useful or may be
>>>> unexpected to some people. But it seems pretty 50/50 so fewer callbacks
>>>> are probably better.
>>>>
>>>>>> };
>>>>>> const char *cputype = NULL;
>>>>>> const char *unit_name = NULL;
>>>>>> @@ -574,6 +575,7 @@ int cmd_list(int argc, const char **argv)
>>>>>> .print_event = json_print_event,
>>>>>> .print_metric = json_print_metric,
>>>>>> .skip_duplicate_pmus = json_skip_duplicate_pmus,
>>>>>> + .collapse_events = false
>>>>>> };
>>>>>> ps = &json_ps;
>>>>>> } else {
>>>>>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>>>>>> index 4d60bac2d2b9..cb1b14ade25b 100644
>>>>>> --- a/tools/perf/util/pmus.c
>>>>>> +++ b/tools/perf/util/pmus.c
>>>>>> @@ -453,17 +453,50 @@ static int cmp_sevent(const void *a, const void *b)
>>>>>> /* Order by PMU name. */
>>>>>> if (as->pmu == bs->pmu)
>>>>>> return 0;
>>>>>> - return strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>>>>>> + ret = strcmp(as->pmu_name ?: "", bs->pmu_name ?: "");
>>>>>> + if (ret)
>>>>>> + return ret;
>>>>>> +
>>>>>> + /* Order by remaining displayed fields for purposes of deduplication later */
>>>>>> + ret = strcmp(as->scale_unit ?: "", bs->scale_unit ?: "");
>>>>>> + if (ret)
>>>>>> + return ret;
>>>>>> + ret = !!as->deprecated - !!bs->deprecated;
>>>>>> + if (ret)
>>>>>> + return ret;
>>>>>> + ret = strcmp(as->desc ?: "", bs->desc ?: "");
>>>>>> + if (ret)
>>>>>> + return ret;
>>>>>> + return strcmp(as->long_desc ?: "", bs->long_desc ?: "");
>>>>>> }
>>>>>>
>>>>>> -static bool pmu_alias_is_duplicate(struct sevent *a, struct sevent *b)
>>>>>> +enum dup_type {
>>>>>> + UNIQUE,
>>>>>> + DUPLICATE,
>>>>>> + SAME_TEXT
>>>>>> +};
>>>>>> +
>>>>>> +static enum dup_type pmu_alias_duplicate_type(struct sevent *a, struct sevent *b)
>>>>>> {
>>>>>> /* Different names -> never duplicates */
>>>>>> if (strcmp(a->name ?: "//", b->name ?: "//"))
>>>>>> - return false;
>>>>>> + return UNIQUE;
>>>>>> +
>>>>>> + /* Duplicate PMU name and event name -> hide completely */
>>>>>> + if (strcmp(a->pmu_name, b->pmu_name) == 0)
>>>>>> + return DUPLICATE;
>>>>>> +
>>>>>> + /* Any other different display text -> not duplicate */
>>>>>> + if (strcmp(a->topic ?: "", b->topic ?: "") ||
>>>>>> + strcmp(a->scale_unit ?: "", b->scale_unit ?: "") ||
>>>>>> + a->deprecated != b->deprecated ||
>>>>>> + strcmp(a->desc ?: "", b->desc ?: "") ||
>>>>>> + strcmp(a->long_desc ?: "", b->long_desc ?: "")) {
>>>>>> + return UNIQUE;
>>>>>> + }
>>>>>>
>>>>>> - /* Don't remove duplicates for different PMUs */
>>>>>> - return strcmp(a->pmu_name, b->pmu_name) == 0;
>>>>>> + /* Same display text but different PMU -> collapse */
>>>>>> + return SAME_TEXT;
>>>>>> }
>>>>>>
>>>>>> struct events_callback_state {
>>>>>> @@ -501,6 +534,21 @@ static int perf_pmus__print_pmu_events__callback(void *vstate,
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +static void concat_pmu_names(char *pmu_names, size_t size, const char *a, const char *b)
>>>>>> +{
>>>>>> + size_t len = strlen(pmu_names);
>>>>>> + size_t added;
>>>>>> +
>>>>>> + if (len)
>>>>>> + added = snprintf(pmu_names + len, size - len, ",%s", b);
>>>>>> + else
>>>>>> + added = snprintf(pmu_names, size, "%s,%s", a, b);
>>>>>> +
>>>>>> + /* Truncate with ... */
>>>>>> + if (added > 0 && added + len >= size)
>>>>>> + sprintf(pmu_names + size - 4, "...");
>>>>>> +}
>>>>>> +
>>>>>> void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *print_state)
>>>>>> {
>>>>>> struct perf_pmu *pmu;
>>>>>> @@ -510,6 +558,7 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>>>> struct events_callback_state state;
>>>>>> bool skip_duplicate_pmus = print_cb->skip_duplicate_pmus(print_state);
>>>>>> struct perf_pmu *(*scan_fn)(struct perf_pmu *);
>>>>>> + char pmu_names[128] = {0};
>>>>>>
>>>>>> if (skip_duplicate_pmus)
>>>>>> scan_fn = perf_pmus__scan_skip_duplicates;
>>>>>> @@ -539,12 +588,21 @@ void perf_pmus__print_pmu_events(const struct print_callbacks *print_cb, void *p
>>>>>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>>>>>> for (int j = 0; j < len; j++) {
>>>>>> /* Skip duplicates */
>>>>>> - if (j < len - 1 && pmu_alias_is_duplicate(&aliases[j], &aliases[j + 1]))
>>>>>> - goto free;
>>>>>> + if (j < len - 1) {
>>>>>> + enum dup_type dt = pmu_alias_duplicate_type(&aliases[j], &aliases[j + 1]);
>>>>>> +
>>>>>> + if (dt == DUPLICATE) {
>>>>>> + goto free;
>>>>>> + } else if (print_cb->collapse_events && dt == SAME_TEXT) {
>>>>>> + concat_pmu_names(pmu_names, sizeof(pmu_names),
>>>>>> + aliases[j].pmu_name, aliases[j+1].pmu_name);
>>>>>> + goto free;
>>>>>> + }
>>>>>> + }
>>>>>
>>>>> I think a label called 'free' is a little unfortunate given the
>>>>> function called free.
>>>>> When I did things with sevent I was bringing over previous `perf list`
>>>>> code mainly so that the perf list output before and after the changes
>>>>> was identical. I wonder if this logic would be better placed in the
>>>>> callback like default_print_event which already maintains state
>>>>> (last_topic) to optionally print different things. This may better
>>>>> encapsulate the behavior rather than the logic being in the PMU code.
>>>>>
>>>>
>>>> Yeah I can have a look at putting it in one of the callbacks. But in the
>>>> end builtin-list.c is the only user of perf_pmus__print_pmu_events()
>>>> anyway so makes you wonder if the whole thing can be moved to
>>>> builtin-list and open coded without the callbackyness.
>>>
>>> I wanted to hide some of the innards of pmus, so I think that's the
>>> reason it is as it is. The whole `sevent` was pre-existing and
>>> maintained so that the output was the same.
>>>
>>
>> Looking at this a bit more it is possible to move all of
>> perf_pmus__print_pmu_events() (including its dependencies and perf list
>> specifics) out of pmus.c into print-events.c without exposing any new
>> innards of pmus. Only struct pmu_event_info would be used, but that's
>> not private and is already used elsewhere.
>>
>> It's difficult to do this change only in the print_event callback
>> because it relies on having the _next_ event, not the previous event.
>> We're already tracking last_topic, and if we start passing all the
>> next_* items too it gets a bit unmanageable.
>>
>> If it's all moved then doing display specific processing across the
>> whole list becomes quite easy.
>
> I'm not sure I follow all of this. If things can be moved to a more
> obvious location, printing code in print-events.c, and we maintain
> encapsulation then that sounds great to me.
I'll give it a go, I think I can come up with something.
> I'm not clear on the next event issue.
For example, pmu_alias_is_duplicate() needs the current and next event,
so it's not easy to move that behavior to builtin-list.c. And my change
to collapse similar events across PMUs also requires the next rather
than previous event.
> My hope with the print routines in builtin-list.c was
> that anyone could come along and add another for whatever rendering
> took their fancy. I didn't want it to be like the logic in
> stat-display.c, where there are multiple levels of call backs, global
> state, odd things like passing NULL meaning display column headers,
> and the whole thing being a confusing rats nest where a change nearly
> always ripples through and breaks something somewhere. Particularly
> compare the json output in stat-display.c and builtin-list.c, my hope
> was that builtin-list.c would be the model for reimplementing the
> stat-display.c one. Others may have different opinions.
>
> Thanks,
> Ian
Makes sense. If I make pmus.c return a list then anyone can loop over
the list and do whatever display they want. It removes the need for the
callback but does require that the consumer know the type of the list
item. It could be struct sevent or even drop that and directly use
struct pmu_event_info.
Having the whole list gives you a lot more flexibility than just the
small window that the print event callback gives you.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-03-11 16:41 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04 13:49 [PATCH 0/3] perf list: Collapse similar events across PMUs James Clark
2025-03-04 13:49 ` [PATCH 1/3] perf list: Order events by event name before PMU name James Clark
2025-03-05 20:59 ` Ian Rogers
2025-03-04 13:49 ` [PATCH 2/3] perf list: Collapse similar events across PMUs James Clark
2025-03-05 9:35 ` Leo Yan
2025-03-05 9:49 ` Leo Yan
2025-03-05 21:40 ` Ian Rogers
2025-03-07 14:08 ` James Clark
2025-03-07 17:35 ` Ian Rogers
2025-03-10 10:04 ` James Clark
2025-03-11 14:13 ` James Clark
2025-03-11 15:14 ` Ian Rogers
2025-03-11 16:41 ` James Clark
2025-03-04 13:49 ` [PATCH 3/3] perf list: Don't deduplicate core PMUs when listing events James Clark
2025-03-05 21:51 ` Ian Rogers
2025-03-07 14:08 ` James Clark
2025-03-05 9:26 ` [PATCH 0/3] perf list: Collapse similar events across PMUs Leo Yan
2025-03-05 20:38 ` Ian Rogers
2025-03-07 14:47 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).