linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Metric related performance improvements
@ 2025-04-10  4:45 Ian Rogers
  2025-04-10  4:45 ` [PATCH v1 1/3] perf pmu: Change aliases from list to hashmap Ian Rogers
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ian Rogers @ 2025-04-10  4:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Xu Yang,
	John Garry, Dominique Martinet, Thomas Richter, Weilin Wang,
	linux-perf-users, linux-kernel

The "PMU JSON event tests" have been running slowly, these changes
target improving them with an improvement of the test running 8 to 10
times faster.

The first patch changes from searching through all aliases by name in
a list to using a hashmap. Doing a fast hashmap__find means testing
for having an event needn't load from disk if an event is already
present.

The second patch switch the fncache to use a hashmap rather than its
own hashmap with a limited number of buckets. When there are many
filename queries, such as with a test, there are many collisions with
the previous fncache approach leading to linear searching of the
entries.

The final patch adds a find function for metrics. Normally metrics can
match by name and group, however, only name matching happens when one
metric refers to another. As we test every "id" in a metric to see if
it is a metric, the find function can dominate performance as it
linearly searches all metrics. Add a find function for the metrics
table so that a metric can be found by name with a binary search.

Before these changes:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m18.499s
user    0m18.150s
sys     0m3.273s
```

After these changes:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m2.338s
user    0m1.797s
sys     0m2.186s
```

Ian Rogers (3):
  perf pmu: Change aliases from list to hashmap
  perf fncache: Switch to using hashmap
  perf metricgroup: Binary search when resolving referred to metrics

 tools/perf/builtin-stat.c                |   6 +-
 tools/perf/pmu-events/empty-pmu-events.c |  66 ++++++++-
 tools/perf/pmu-events/jevents.py         |  66 ++++++++-
 tools/perf/pmu-events/pmu-events.h       |  23 +++-
 tools/perf/tests/pmu-events.c            | 129 +++++++++--------
 tools/perf/util/fncache.c                |  69 +++++-----
 tools/perf/util/fncache.h                |   1 -
 tools/perf/util/hwmon_pmu.c              |  43 +++---
 tools/perf/util/metricgroup.c            | 102 ++++++--------
 tools/perf/util/metricgroup.h            |   2 +-
 tools/perf/util/pmu.c                    | 167 +++++++++++++++--------
 tools/perf/util/pmu.h                    |   4 +-
 tools/perf/util/srccode.c                |   4 +-
 tools/perf/util/tool_pmu.c               |  17 +--
 14 files changed, 430 insertions(+), 269 deletions(-)

-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/3] perf pmu: Change aliases from list to hashmap
  2025-04-10  4:45 [PATCH v1 0/3] Metric related performance improvements Ian Rogers
@ 2025-04-10  4:45 ` Ian Rogers
  2025-04-10  4:45 ` [PATCH v1 2/3] perf fncache: Switch to using hashmap Ian Rogers
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ian Rogers @ 2025-04-10  4:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Xu Yang,
	John Garry, Dominique Martinet, Thomas Richter, Weilin Wang,
	linux-perf-users, linux-kernel

Finding an alias for things like perf_pmu__have_event would need to
search the aliases list, whilst this happens relatively infrequently
it can be a significant overhead in testing. Switch to using a
hashmap. Move common initialization code to perf_pmu__init. Refactor
the test strct perf_pmu_test_pmu to not have perf pmu within it to
better support the perf_pmu__init function.

Before:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m18.499s
user    0m18.150s
sys     0m3.273s
```

After:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m17.887s
user    0m17.525s
sys     0m3.310s
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/pmu-events.c | 129 +++++++++++++-------------
 tools/perf/util/hwmon_pmu.c   |  43 ++++-----
 tools/perf/util/pmu.c         | 167 ++++++++++++++++++++++------------
 tools/perf/util/pmu.h         |   4 +-
 tools/perf/util/tool_pmu.c    |  17 +---
 5 files changed, 199 insertions(+), 161 deletions(-)

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index db004d26fcb0..815b40097428 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -38,7 +38,9 @@ struct perf_pmu_test_event {
 };
 
 struct perf_pmu_test_pmu {
-	struct perf_pmu pmu;
+	const char *pmu_name;
+	bool pmu_is_uncore;
+	const char *pmu_id;
 	struct perf_pmu_test_event const *aliases[10];
 };
 
@@ -553,11 +555,10 @@ static int __test_core_pmu_event_aliases(const char *pmu_name, int *count)
 	if (!pmu)
 		return -1;
 
-	INIT_LIST_HEAD(&pmu->format);
-	INIT_LIST_HEAD(&pmu->aliases);
-	INIT_LIST_HEAD(&pmu->caps);
-	INIT_LIST_HEAD(&pmu->list);
-	pmu->name = strdup(pmu_name);
+	if (perf_pmu__init(pmu, PERF_PMU_TYPE_FAKE, pmu_name) != 0) {
+		perf_pmu__delete(pmu);
+		return -1;
+	}
 	pmu->is_core = true;
 
 	pmu->events_table = table;
@@ -594,14 +595,30 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 {
 	int alias_count = 0, to_match_count = 0, matched_count = 0;
 	struct perf_pmu_test_event const **table;
-	struct perf_pmu *pmu = &test_pmu->pmu;
-	const char *pmu_name = pmu->name;
+	struct perf_pmu *pmu;
 	const struct pmu_events_table *events_table;
 	int res = 0;
 
 	events_table = find_core_events_table("testarch", "testcpu");
 	if (!events_table)
 		return -1;
+
+	pmu = zalloc(sizeof(*pmu));
+	if (!pmu)
+		return -1;
+
+	if (perf_pmu__init(pmu, PERF_PMU_TYPE_FAKE, test_pmu->pmu_name) != 0) {
+		perf_pmu__delete(pmu);
+		return -1;
+	}
+	pmu->is_uncore = test_pmu->pmu_is_uncore;
+	if (test_pmu->pmu_id) {
+		pmu->id = strdup(test_pmu->pmu_id);
+		if (!pmu->id) {
+			perf_pmu__delete(pmu);
+			return -1;
+		}
+	}
 	pmu->events_table = events_table;
 	pmu_add_cpu_aliases_table(pmu, events_table);
 	pmu->cpu_aliases_added = true;
@@ -617,7 +634,8 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 
 	if (alias_count != to_match_count) {
 		pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
-			 pmu_name, to_match_count, alias_count);
+			 pmu->name, to_match_count, alias_count);
+		perf_pmu__delete(pmu);
 		return -1;
 	}
 
@@ -630,9 +648,10 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 			.count = &matched_count,
 		};
 
-		if (strcmp(pmu_name, test_event.matching_pmu)) {
+		if (strcmp(pmu->name, test_event.matching_pmu)) {
 			pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s vs %s\n",
-					pmu_name, test_event.matching_pmu, pmu_name);
+					pmu->name, test_event.matching_pmu, pmu->name);
+			perf_pmu__delete(pmu);
 			return -1;
 		}
 
@@ -641,34 +660,32 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
 		if (err) {
 			res = err;
 			pr_debug("testing aliases uncore PMU %s: could not match alias %s\n",
-				 pmu_name, event->name);
+				 pmu->name, event->name);
+			perf_pmu__delete(pmu);
 			return -1;
 		}
 	}
 
 	if (alias_count != matched_count) {
 		pr_debug("testing aliases uncore PMU %s: mismatch found aliases (%d) vs matched (%d)\n",
-			 pmu_name, matched_count, alias_count);
+			 pmu->name, matched_count, alias_count);
 		res = -1;
 	}
+	perf_pmu__delete(pmu);
 	return res;
 }
 
 static struct perf_pmu_test_pmu test_pmus[] = {
 	{
-		.pmu = {
-			.name = "hisi_sccl1_ddrc2",
-			.is_uncore = 1,
-		},
+		.pmu_name = "hisi_sccl1_ddrc2",
+		.pmu_is_uncore = 1,
 		.aliases = {
 			&uncore_hisi_ddrc_flux_wcmd,
 		},
 	},
 	{
-		.pmu = {
-			.name = "uncore_cbox_0",
-			.is_uncore = 1,
-		},
+		.pmu_name = "uncore_cbox_0",
+		.pmu_is_uncore = 1,
 		.aliases = {
 			&unc_cbo_xsnp_response_miss_eviction,
 			&uncore_hyphen,
@@ -676,88 +693,70 @@ static struct perf_pmu_test_pmu test_pmus[] = {
 		},
 	},
 	{
-		.pmu = {
-			.name = "hisi_sccl3_l3c7",
-			.is_uncore = 1,
-		},
+		.pmu_name = "hisi_sccl3_l3c7",
+		.pmu_is_uncore = 1,
 		.aliases = {
 			&uncore_hisi_l3c_rd_hit_cpipe,
 		},
 	},
 	{
-		.pmu = {
-			.name = "uncore_imc_free_running_0",
-			.is_uncore = 1,
-		},
+		.pmu_name = "uncore_imc_free_running_0",
+		.pmu_is_uncore = 1,
 		.aliases = {
 			&uncore_imc_free_running_cache_miss,
 		},
 	},
 	{
-		.pmu = {
-			.name = "uncore_imc_0",
-			.is_uncore = 1,
-		},
+		.pmu_name = "uncore_imc_0",
+		.pmu_is_uncore = 1,
 		.aliases = {
 			&uncore_imc_cache_hits,
 		},
 	},
 	{
-		.pmu = {
-			.name = "uncore_sys_ddr_pmu0",
-			.is_uncore = 1,
-			.id = "v8",
-		},
+		.pmu_name = "uncore_sys_ddr_pmu0",
+		.pmu_is_uncore = 1,
+		.pmu_id = "v8",
 		.aliases = {
 			&sys_ddr_pmu_write_cycles,
 		},
 	},
 	{
-		.pmu = {
-			.name = "uncore_sys_ccn_pmu4",
-			.is_uncore = 1,
-			.id = "0x01",
-		},
+		.pmu_name = "uncore_sys_ccn_pmu4",
+		.pmu_is_uncore = 1,
+		.pmu_id = "0x01",
 		.aliases = {
 			&sys_ccn_pmu_read_cycles,
 		},
 	},
 	{
-		.pmu = {
-			.name = (char *)"uncore_sys_cmn_pmu0",
-			.is_uncore = 1,
-			.id = (char *)"43401",
-		},
+		.pmu_name = "uncore_sys_cmn_pmu0",
+		.pmu_is_uncore = 1,
+		.pmu_id = "43401",
 		.aliases = {
 			&sys_cmn_pmu_hnf_cache_miss,
 		},
 	},
 	{
-		.pmu = {
-			.name = (char *)"uncore_sys_cmn_pmu0",
-			.is_uncore = 1,
-			.id = (char *)"43602",
-		},
+		.pmu_name = "uncore_sys_cmn_pmu0",
+		.pmu_is_uncore = 1,
+		.pmu_id = "43602",
 		.aliases = {
 			&sys_cmn_pmu_hnf_cache_miss,
 		},
 	},
 	{
-		.pmu = {
-			.name = (char *)"uncore_sys_cmn_pmu0",
-			.is_uncore = 1,
-			.id = (char *)"43c03",
-		},
+		.pmu_name = "uncore_sys_cmn_pmu0",
+		.pmu_is_uncore = 1,
+		.pmu_id = "43c03",
 		.aliases = {
 			&sys_cmn_pmu_hnf_cache_miss,
 		},
 	},
 	{
-		.pmu = {
-			.name = (char *)"uncore_sys_cmn_pmu0",
-			.is_uncore = 1,
-			.id = (char *)"43a01",
-		},
+		.pmu_name = "uncore_sys_cmn_pmu0",
+		.pmu_is_uncore = 1,
+		.pmu_id = "43a01",
 		.aliases = {
 			&sys_cmn_pmu_hnf_cache_miss,
 		},
@@ -796,10 +795,6 @@ static int test__aliases(struct test_suite *test __maybe_unused,
 	for (i = 0; i < ARRAY_SIZE(test_pmus); i++) {
 		int res;
 
-		INIT_LIST_HEAD(&test_pmus[i].pmu.format);
-		INIT_LIST_HEAD(&test_pmus[i].pmu.aliases);
-		INIT_LIST_HEAD(&test_pmus[i].pmu.caps);
-
 		res = __test_uncore_pmu_event_aliases(&test_pmus[i]);
 		if (res)
 			return res;
diff --git a/tools/perf/util/hwmon_pmu.c b/tools/perf/util/hwmon_pmu.c
index 3cce77fc8004..c25e7296f1c1 100644
--- a/tools/perf/util/hwmon_pmu.c
+++ b/tools/perf/util/hwmon_pmu.c
@@ -346,42 +346,43 @@ struct perf_pmu *hwmon_pmu__new(struct list_head *pmus, int hwmon_dir, const cha
 {
 	char buf[32];
 	struct hwmon_pmu *hwm;
+	__u32 type = PERF_PMU_TYPE_HWMON_START + strtoul(sysfs_name + 5, NULL, 10);
+
+	if (type > PERF_PMU_TYPE_HWMON_END) {
+		pr_err("Unable to encode hwmon type from %s in valid PMU type\n", sysfs_name);
+		return NULL;
+	}
+
+	snprintf(buf, sizeof(buf), "hwmon_%s", name);
+	fix_name(buf + 6);
 
 	hwm = zalloc(sizeof(*hwm));
 	if (!hwm)
 		return NULL;
 
-	hwm->hwmon_dir_fd = hwmon_dir;
-	hwm->pmu.type = PERF_PMU_TYPE_HWMON_START + strtoul(sysfs_name + 5, NULL, 10);
-	if (hwm->pmu.type > PERF_PMU_TYPE_HWMON_END) {
-		pr_err("Unable to encode hwmon type from %s in valid PMU type\n", sysfs_name);
-		goto err_out;
+	if (perf_pmu__init(&hwm->pmu, type, buf) != 0) {
+		perf_pmu__delete(&hwm->pmu);
+		return NULL;
 	}
-	snprintf(buf, sizeof(buf), "hwmon_%s", name);
-	fix_name(buf + 6);
-	hwm->pmu.name = strdup(buf);
-	if (!hwm->pmu.name)
-		goto err_out;
+
+	hwm->hwmon_dir_fd = hwmon_dir;
 	hwm->pmu.alias_name = strdup(sysfs_name);
-	if (!hwm->pmu.alias_name)
-		goto err_out;
+	if (!hwm->pmu.alias_name) {
+		perf_pmu__delete(&hwm->pmu);
+		return NULL;
+	}
 	hwm->pmu.cpus = perf_cpu_map__new("0");
-	if (!hwm->pmu.cpus)
-		goto err_out;
+	if (!hwm->pmu.cpus) {
+		perf_pmu__delete(&hwm->pmu);
+		return NULL;
+	}
 	INIT_LIST_HEAD(&hwm->pmu.format);
-	INIT_LIST_HEAD(&hwm->pmu.aliases);
 	INIT_LIST_HEAD(&hwm->pmu.caps);
 	hashmap__init(&hwm->events, hwmon_pmu__event_hashmap_hash,
 		      hwmon_pmu__event_hashmap_equal, /*ctx=*/NULL);
 
 	list_add_tail(&hwm->pmu.list, pmus);
 	return &hwm->pmu;
-err_out:
-	free((char *)hwm->pmu.name);
-	free(hwm->pmu.alias_name);
-	free(hwm);
-	close(hwmon_dir);
-	return NULL;
 }
 
 void hwmon_pmu__exit(struct perf_pmu *pmu)
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b7ebac5ab1d1..4d5d44622fcc 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -27,6 +27,7 @@
 #include <util/pmu-flex.h>
 #include "parse-events.h"
 #include "print-events.h"
+#include "hashmap.h"
 #include "header.h"
 #include "string2.h"
 #include "strbuf.h"
@@ -66,8 +67,6 @@ struct perf_pmu_alias {
 	char *topic;
 	/** @terms: Owned list of the original parsed parameters. */
 	struct parse_events_terms terms;
-	/** @list: List element of struct perf_pmu aliases. */
-	struct list_head list;
 	/**
 	 * @pmu_name: The name copied from the json struct pmu_event. This can
 	 * differ from the PMU name as it won't have suffixes.
@@ -407,25 +406,33 @@ static void perf_pmu__parse_snapshot(struct perf_pmu *pmu, struct perf_pmu_alias
 }
 
 /* Delete an alias entry. */
-static void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
+static void perf_pmu_free_alias(struct perf_pmu_alias *alias)
 {
-	zfree(&newalias->name);
-	zfree(&newalias->desc);
-	zfree(&newalias->long_desc);
-	zfree(&newalias->topic);
-	zfree(&newalias->pmu_name);
-	parse_events_terms__exit(&newalias->terms);
-	free(newalias);
+	if (!alias)
+		return;
+
+	zfree(&alias->name);
+	zfree(&alias->desc);
+	zfree(&alias->long_desc);
+	zfree(&alias->topic);
+	zfree(&alias->pmu_name);
+	parse_events_terms__exit(&alias->terms);
+	free(alias);
 }
 
 static void perf_pmu__del_aliases(struct perf_pmu *pmu)
 {
-	struct perf_pmu_alias *alias, *tmp;
+	struct hashmap_entry *entry;
+	size_t bkt;
 
-	list_for_each_entry_safe(alias, tmp, &pmu->aliases, list) {
-		list_del(&alias->list);
-		perf_pmu_free_alias(alias);
-	}
+	if (!pmu->aliases)
+		return;
+
+	hashmap__for_each_entry(pmu->aliases, entry, bkt)
+		perf_pmu_free_alias(entry->pvalue);
+
+	hashmap__free(pmu->aliases);
+	pmu->aliases = NULL;
 }
 
 static struct perf_pmu_alias *perf_pmu__find_alias(struct perf_pmu *pmu,
@@ -433,35 +440,37 @@ static struct perf_pmu_alias *perf_pmu__find_alias(struct perf_pmu *pmu,
 						   bool load)
 {
 	struct perf_pmu_alias *alias;
+	bool has_sysfs_event;
+	char event_file_name[FILENAME_MAX + 8];
 
-	if (load && !pmu->sysfs_aliases_loaded) {
-		bool has_sysfs_event;
-		char event_file_name[FILENAME_MAX + 8];
+	if (hashmap__find(pmu->aliases, name, &alias))
+		return alias;
 
-		/*
-		 * Test if alias/event 'name' exists in the PMU's sysfs/events
-		 * directory. If not skip parsing the sysfs aliases. Sysfs event
-		 * name must be all lower or all upper case.
-		 */
-		scnprintf(event_file_name, sizeof(event_file_name), "events/%s", name);
-		for (size_t i = 7, n = 7 + strlen(name); i < n; i++)
-			event_file_name[i] = tolower(event_file_name[i]);
+	if (!load || pmu->sysfs_aliases_loaded)
+		return NULL;
 
-		has_sysfs_event = perf_pmu__file_exists(pmu, event_file_name);
-		if (!has_sysfs_event) {
-			for (size_t i = 7, n = 7 + strlen(name); i < n; i++)
-				event_file_name[i] = toupper(event_file_name[i]);
+	/*
+	 * Test if alias/event 'name' exists in the PMU's sysfs/events
+	 * directory. If not skip parsing the sysfs aliases. Sysfs event
+	 * name must be all lower or all upper case.
+	 */
+	scnprintf(event_file_name, sizeof(event_file_name), "events/%s", name);
+	for (size_t i = 7, n = 7 + strlen(name); i < n; i++)
+		event_file_name[i] = tolower(event_file_name[i]);
 
-			has_sysfs_event = perf_pmu__file_exists(pmu, event_file_name);
-		}
-		if (has_sysfs_event)
-			pmu_aliases_parse(pmu);
+	has_sysfs_event = perf_pmu__file_exists(pmu, event_file_name);
+	if (!has_sysfs_event) {
+		for (size_t i = 7, n = 7 + strlen(name); i < n; i++)
+			event_file_name[i] = toupper(event_file_name[i]);
 
+		has_sysfs_event = perf_pmu__file_exists(pmu, event_file_name);
 	}
-	list_for_each_entry(alias, &pmu->aliases, list) {
-		if (!strcasecmp(alias->name, name))
+	if (has_sysfs_event) {
+		pmu_aliases_parse(pmu);
+		if (hashmap__find(pmu->aliases, name, &alias))
 			return alias;
 	}
+
 	return NULL;
 }
 
@@ -532,7 +541,7 @@ static int perf_pmu__new_alias(struct perf_pmu *pmu, const char *name,
 				const char *desc, const char *val, FILE *val_fd,
 			        const struct pmu_event *pe, enum event_source src)
 {
-	struct perf_pmu_alias *alias;
+	struct perf_pmu_alias *alias, *old_alias;
 	int ret;
 	const char *long_desc = NULL, *topic = NULL, *unit = NULL, *pmu_name = NULL;
 	bool deprecated = false, perpkg = false;
@@ -607,7 +616,8 @@ static int perf_pmu__new_alias(struct perf_pmu *pmu, const char *name,
 		break;
 
 	}
-	list_add_tail(&alias->list, &pmu->aliases);
+	hashmap__set(pmu->aliases, alias->name, alias, /*old_key=*/ NULL, &old_alias);
+	perf_pmu_free_alias(old_alias);
 	return 0;
 }
 
@@ -1095,43 +1105,77 @@ perf_pmu__arch_init(struct perf_pmu *pmu)
 		pmu->mem_events = perf_mem_events;
 }
 
+/* Variant of str_hash that does tolower on each character. */
+static size_t aliases__hash(long key, void *ctx __maybe_unused)
+{
+	const char *s = (const char *)key;
+	size_t h = 0;
+
+	while (*s) {
+		h = h * 31 + tolower(*s);
+		s++;
+	}
+	return h;
+}
+
+static bool aliases__equal(long key1, long key2, void *ctx __maybe_unused)
+{
+	return strcasecmp((const char *)key1, (const char *)key2) == 0;
+}
+
+int perf_pmu__init(struct perf_pmu *pmu, __u32 type, const char *name)
+{
+	pmu->type = type;
+	INIT_LIST_HEAD(&pmu->format);
+	INIT_LIST_HEAD(&pmu->caps);
+
+	pmu->name = strdup(name);
+	if (!pmu->name)
+		return -ENOMEM;
+
+	pmu->aliases = hashmap__new(aliases__hash, aliases__equal, /*ctx=*/ NULL);
+	if (!pmu->aliases)
+		return -ENOMEM;
+
+	return 0;
+}
+
 struct perf_pmu *perf_pmu__lookup(struct list_head *pmus, int dirfd, const char *name,
 				  bool eager_load)
 {
 	struct perf_pmu *pmu;
-	__u32 type;
 
 	pmu = zalloc(sizeof(*pmu));
 	if (!pmu)
 		return NULL;
 
-	pmu->name = strdup(name);
-	if (!pmu->name)
-		goto err;
+	if (perf_pmu__init(pmu, PERF_PMU_TYPE_FAKE, name) != 0) {
+		perf_pmu__delete(pmu);
+		return NULL;
+	}
 
 	/*
 	 * Read type early to fail fast if a lookup name isn't a PMU. Ensure
 	 * that type value is successfully assigned (return 1).
 	 */
-	if (perf_pmu__scan_file_at(pmu, dirfd, "type", "%u", &type) != 1)
-		goto err;
-
-	INIT_LIST_HEAD(&pmu->format);
-	INIT_LIST_HEAD(&pmu->aliases);
-	INIT_LIST_HEAD(&pmu->caps);
+	if (perf_pmu__scan_file_at(pmu, dirfd, "type", "%u", &pmu->type) != 1) {
+		perf_pmu__delete(pmu);
+		return NULL;
+	}
 
 	/*
 	 * The pmu data we store & need consists of the pmu
 	 * type value and format definitions. Load both right
 	 * now.
 	 */
-	if (pmu_format(pmu, dirfd, name, eager_load))
-		goto err;
+	if (pmu_format(pmu, dirfd, name, eager_load)) {
+		perf_pmu__delete(pmu);
+		return NULL;
+	}
 
 	pmu->is_core = is_pmu_core(name);
 	pmu->cpus = pmu_cpumask(dirfd, name, pmu->is_core);
 
-	pmu->type = type;
 	pmu->is_uncore = pmu_is_uncore(dirfd, name);
 	if (pmu->is_uncore)
 		pmu->id = pmu_id(name);
@@ -1153,10 +1197,6 @@ struct perf_pmu *perf_pmu__lookup(struct list_head *pmus, int dirfd, const char
 		pmu_aliases_parse_eager(pmu, dirfd);
 
 	return pmu;
-err:
-	zfree(&pmu->name);
-	free(pmu);
-	return NULL;
 }
 
 /* Creates the PMU when sysfs scanning fails. */
@@ -1178,7 +1218,7 @@ struct perf_pmu *perf_pmu__create_placeholder_core_pmu(struct list_head *core_pm
 	pmu->cpus = cpu_map__online();
 
 	INIT_LIST_HEAD(&pmu->format);
-	INIT_LIST_HEAD(&pmu->aliases);
+	pmu->aliases = hashmap__new(aliases__hash, aliases__equal, /*ctx=*/ NULL);
 	INIT_LIST_HEAD(&pmu->caps);
 	list_add_tail(&pmu->list, core_pmus);
 	return pmu;
@@ -1930,13 +1970,14 @@ int perf_pmu__for_each_event(struct perf_pmu *pmu, bool skip_duplicate_pmus,
 			     void *state, pmu_event_callback cb)
 {
 	char buf[1024];
-	struct perf_pmu_alias *event;
 	struct pmu_event_info info = {
 		.pmu = pmu,
 		.event_type_desc = "Kernel PMU event",
 	};
 	int ret = 0;
 	struct strbuf sb;
+	struct hashmap_entry *entry;
+	size_t bkt;
 
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__for_each_event(pmu, state, cb);
@@ -1944,7 +1985,8 @@ int perf_pmu__for_each_event(struct perf_pmu *pmu, bool skip_duplicate_pmus,
 	strbuf_init(&sb, /*hint=*/ 0);
 	pmu_aliases_parse(pmu);
 	pmu_add_cpu_aliases(pmu);
-	list_for_each_entry(event, &pmu->aliases, list) {
+	hashmap__for_each_entry(pmu->aliases, entry, bkt) {
+		struct perf_pmu_alias *event = entry->pvalue;
 		size_t buf_used, pmu_name_len;
 
 		if (perf_pmu__is_tool(pmu) && tool_pmu__skip_event(event->name))
@@ -2401,6 +2443,9 @@ int perf_pmu__pathname_fd(int dirfd, const char *pmu_name, const char *filename,
 
 void perf_pmu__delete(struct perf_pmu *pmu)
 {
+	if (!pmu)
+		return;
+
 	if (perf_pmu__is_hwmon(pmu))
 		hwmon_pmu__exit(pmu);
 
@@ -2418,14 +2463,16 @@ void perf_pmu__delete(struct perf_pmu *pmu)
 
 const char *perf_pmu__name_from_config(struct perf_pmu *pmu, u64 config)
 {
-	struct perf_pmu_alias *event;
+	struct hashmap_entry *entry;
+	size_t bkt;
 
 	if (!pmu)
 		return NULL;
 
 	pmu_aliases_parse(pmu);
 	pmu_add_cpu_aliases(pmu);
-	list_for_each_entry(event, &pmu->aliases, list) {
+	hashmap__for_each_entry(pmu->aliases, entry, bkt) {
+		struct perf_pmu_alias *event = entry->pvalue;
 		struct perf_event_attr attr = {.config = 0,};
 
 		int ret = perf_pmu__config(pmu, &attr, &event->terms, /*apply_hardcoded=*/true,
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index b93014cc3670..0444e255dae3 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -14,6 +14,7 @@
 #include "mem-events.h"
 
 struct evsel_config_term;
+struct hashmap;
 struct perf_cpu_map;
 struct print_callbacks;
 
@@ -125,7 +126,7 @@ struct perf_pmu {
 	 * event read from <sysfs>/bus/event_source/devices/<name>/events/ or
 	 * from json events in pmu-events.c.
 	 */
-	struct list_head aliases;
+	struct hashmap *aliases;
 	/**
 	 * @events_table: The events table for json events in pmu-events.c.
 	 */
@@ -289,6 +290,7 @@ int perf_pmu__pathname_scnprintf(char *buf, size_t size,
 int perf_pmu__event_source_devices_fd(void);
 int perf_pmu__pathname_fd(int dirfd, const char *pmu_name, const char *filename, int flags);
 
+int perf_pmu__init(struct perf_pmu *pmu, __u32 type, const char *name);
 struct perf_pmu *perf_pmu__lookup(struct list_head *pmus, int dirfd, const char *lookup_name,
 				  bool eager_load);
 struct perf_pmu *perf_pmu__create_placeholder_core_pmu(struct list_head *core_pmus);
diff --git a/tools/perf/util/tool_pmu.c b/tools/perf/util/tool_pmu.c
index 97b327d1ce4a..992dc0134c4c 100644
--- a/tools/perf/util/tool_pmu.c
+++ b/tools/perf/util/tool_pmu.c
@@ -496,19 +496,12 @@ struct perf_pmu *tool_pmu__new(void)
 	struct perf_pmu *tool = zalloc(sizeof(struct perf_pmu));
 
 	if (!tool)
-		goto out;
-	tool->name = strdup("tool");
-	if (!tool->name) {
-		zfree(&tool);
-		goto out;
-	}
+		return NULL;
 
-	tool->type = PERF_PMU_TYPE_TOOL;
-	INIT_LIST_HEAD(&tool->aliases);
-	INIT_LIST_HEAD(&tool->caps);
-	INIT_LIST_HEAD(&tool->format);
+	if (perf_pmu__init(tool, PERF_PMU_TYPE_TOOL, "tool") != 0) {
+		perf_pmu__delete(tool);
+		return NULL;
+	}
 	tool->events_table = find_core_events_table("common", "common");
-
-out:
 	return tool;
 }
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 2/3] perf fncache: Switch to using hashmap
  2025-04-10  4:45 [PATCH v1 0/3] Metric related performance improvements Ian Rogers
  2025-04-10  4:45 ` [PATCH v1 1/3] perf pmu: Change aliases from list to hashmap Ian Rogers
@ 2025-04-10  4:45 ` Ian Rogers
  2025-04-10  4:45 ` [PATCH v1 3/3] perf metricgroup: Binary search when resolving referred to metrics Ian Rogers
  2025-04-10  6:49 ` [PATCH v1 0/3] Metric related performance improvements Namhyung Kim
  3 siblings, 0 replies; 11+ messages in thread
From: Ian Rogers @ 2025-04-10  4:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Xu Yang,
	John Garry, Dominique Martinet, Thomas Richter, Weilin Wang,
	linux-perf-users, linux-kernel

The existing fncache can get large in testing situations. As the
bucket array is a fixed size this leads to it degrading to O(n)
performance. Use a regular hashmap that can dynamically reallocate its
array.

Before:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m17.887s
user    0m17.525s
sys     0m3.310s
```

After:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m15.551s
user    0m15.092s
sys     0m3.009s
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/fncache.c | 69 +++++++++++++++++++++------------------
 tools/perf/util/fncache.h |  1 -
 tools/perf/util/srccode.c |  4 +--
 3 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/tools/perf/util/fncache.c b/tools/perf/util/fncache.c
index 6225cbc52310..bf9559c55c63 100644
--- a/tools/perf/util/fncache.c
+++ b/tools/perf/util/fncache.c
@@ -1,53 +1,58 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Manage a cache of file names' existence */
+#include <pthread.h>
 #include <stdlib.h>
-#include <unistd.h>
 #include <string.h>
-#include <linux/list.h>
+#include <unistd.h>
+#include <linux/compiler.h>
 #include "fncache.h"
+#include "hashmap.h"
 
-struct fncache {
-	struct hlist_node nd;
-	bool res;
-	char name[];
-};
+static struct hashmap *fncache;
 
-#define FNHSIZE 61
+static size_t fncache__hash(long key, void *ctx __maybe_unused)
+{
+	return str_hash((const char *)key);
+}
 
-static struct hlist_head fncache_hash[FNHSIZE];
+static bool fncache__equal(long key1, long key2, void *ctx __maybe_unused)
+{
+	return strcmp((const char *)key1, (const char *)key2) == 0;
+}
 
-unsigned shash(const unsigned char *s)
+static void fncache__init(void)
 {
-	unsigned h = 0;
-	while (*s)
-		h = 65599 * h + *s++;
-	return h ^ (h >> 16);
+	fncache = hashmap__new(fncache__hash, fncache__equal, /*ctx=*/NULL);
+}
+
+static struct hashmap *fncache__get(void)
+{
+	static pthread_once_t fncache_once = PTHREAD_ONCE_INIT;
+
+	pthread_once(&fncache_once, fncache__init);
+
+	return fncache;
 }
 
 static bool lookup_fncache(const char *name, bool *res)
 {
-	int h = shash((const unsigned char *)name) % FNHSIZE;
-	struct fncache *n;
-
-	hlist_for_each_entry(n, &fncache_hash[h], nd) {
-		if (!strcmp(n->name, name)) {
-			*res = n->res;
-			return true;
-		}
-	}
-	return false;
+	long val;
+
+	if (!hashmap__find(fncache__get(), name, &val))
+		return false;
+
+	*res = (val != 0);
+	return true;
 }
 
 static void update_fncache(const char *name, bool res)
 {
-	struct fncache *n = malloc(sizeof(struct fncache) + strlen(name) + 1);
-	int h = shash((const unsigned char *)name) % FNHSIZE;
-
-	if (!n)
-		return;
-	strcpy(n->name, name);
-	n->res = res;
-	hlist_add_head(&n->nd, &fncache_hash[h]);
+	char *old_key = NULL, *key = strdup(name);
+
+	if (key) {
+		hashmap__set(fncache__get(), key, res, &old_key, /*old_value*/NULL);
+		free(old_key);
+	}
 }
 
 /* No LRU, only use when bounded in some other way. */
diff --git a/tools/perf/util/fncache.h b/tools/perf/util/fncache.h
index fe020beaefb1..b6a0f209493e 100644
--- a/tools/perf/util/fncache.h
+++ b/tools/perf/util/fncache.h
@@ -1,7 +1,6 @@
 #ifndef _FCACHE_H
 #define _FCACHE_H 1
 
-unsigned shash(const unsigned char *s);
 bool file_available(const char *name);
 
 #endif
diff --git a/tools/perf/util/srccode.c b/tools/perf/util/srccode.c
index 476e99896d5e..0f4907843ac1 100644
--- a/tools/perf/util/srccode.c
+++ b/tools/perf/util/srccode.c
@@ -16,7 +16,7 @@
 #include "srccode.h"
 #include "debug.h"
 #include <internal/lib.h> // page_size
-#include "fncache.h"
+#include "hashmap.h"
 
 #define MAXSRCCACHE (32*1024*1024)
 #define MAXSRCFILES     64
@@ -92,7 +92,7 @@ static struct srcfile *find_srcfile(char *fn)
 	struct srcfile *h;
 	int fd;
 	unsigned long sz;
-	unsigned hval = shash((unsigned char *)fn) % SRC_HTAB_SZ;
+	size_t hval = str_hash(fn) % SRC_HTAB_SZ;
 
 	hlist_for_each_entry (h, &srcfile_htab[hval], hash_nd) {
 		if (!strcmp(fn, h->fn)) {
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 3/3] perf metricgroup: Binary search when resolving referred to metrics
  2025-04-10  4:45 [PATCH v1 0/3] Metric related performance improvements Ian Rogers
  2025-04-10  4:45 ` [PATCH v1 1/3] perf pmu: Change aliases from list to hashmap Ian Rogers
  2025-04-10  4:45 ` [PATCH v1 2/3] perf fncache: Switch to using hashmap Ian Rogers
@ 2025-04-10  4:45 ` Ian Rogers
  2025-04-10  6:49 ` [PATCH v1 0/3] Metric related performance improvements Namhyung Kim
  3 siblings, 0 replies; 11+ messages in thread
From: Ian Rogers @ 2025-04-10  4:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Xu Yang,
	John Garry, Dominique Martinet, Thomas Richter, Weilin Wang,
	linux-perf-users, linux-kernel

Unlike with events, metrics can be matched by name or a list of metric
groups. However, when a metric refers to another metric it isn't
referring to a group but the singular metric in question. Prior to
this change every "id" in a metric expression is checked to see if it
is a metric by scanning all the metrics in the metrics table. As the
table is sorted my metric name we can speed the search in the
resolution case by binary searching for the metric.

Rename some of the metricgroup functions to make it clearer whether
they match a metric by name or by both name and group.

Before:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m15.551s
user    0m15.092s
sys     0m3.009s
```

After:
```
$ time perf test -v 10
 10: PMU JSON event tests                                            :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 10.5: Parsing of metric thresholds with fake PMUs                   : Ok

real    0m2.338s
user    0m1.797s
sys     0m2.186s
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c                |   6 +-
 tools/perf/pmu-events/empty-pmu-events.c |  66 ++++++++++++++-
 tools/perf/pmu-events/jevents.py         |  66 ++++++++++++++-
 tools/perf/pmu-events/pmu-events.h       |  23 +++--
 tools/perf/util/metricgroup.c            | 102 +++++++++--------------
 tools/perf/util/metricgroup.h            |   2 +-
 6 files changed, 192 insertions(+), 73 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 68ea7589c143..67e6efaabe27 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1856,7 +1856,7 @@ static int add_default_events(void)
 		 * will use this approach. To determine transaction support
 		 * on an architecture test for such a metric name.
 		 */
-		if (!metricgroup__has_metric(pmu, "transaction")) {
+		if (!metricgroup__has_metric_or_groups(pmu, "transaction")) {
 			pr_err("Missing transaction metrics\n");
 			ret = -1;
 			goto out;
@@ -1890,7 +1890,7 @@ static int add_default_events(void)
 			smi_reset = true;
 		}
 
-		if (!metricgroup__has_metric(pmu, "smi")) {
+		if (!metricgroup__has_metric_or_groups(pmu, "smi")) {
 			pr_err("Missing smi metrics\n");
 			ret = -1;
 			goto out;
@@ -1980,7 +1980,7 @@ static int add_default_events(void)
 		 * Add TopdownL1 metrics if they exist. To minimize
 		 * multiplexing, don't request threshold computation.
 		 */
-		if (metricgroup__has_metric(pmu, "Default")) {
+		if (metricgroup__has_metric_or_groups(pmu, "Default")) {
 			struct evlist *metric_evlist = evlist__new();
 
 			if (!metric_evlist) {
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index 0cb7ba7912e8..5cdf7e68e331 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -443,7 +443,7 @@ int pmu_events_table__find_event(const struct pmu_events_table *table,
                 const char *pmu_name = &big_c_string[table_pmu->pmu_name.offset];
                 int ret;
 
-                if (!perf_pmu__name_wildcard_match(pmu, pmu_name))
+                if (pmu && !perf_pmu__name_wildcard_match(pmu, pmu_name))
                         continue;
 
                 ret = pmu_events_table__find_event_pmu(table, table_pmu, name, fn, data);
@@ -489,6 +489,49 @@ static int pmu_metrics_table__for_each_metric_pmu(const struct pmu_metrics_table
         return 0;
 }
 
+static int pmu_metrics_table__find_metric_pmu(const struct pmu_metrics_table *table,
+                                            const struct pmu_table_entry *pmu,
+                                            const char *metric,
+                                            pmu_metric_iter_fn fn,
+                                            void *data)
+{
+        struct pmu_metric pm = {
+                .pmu = &big_c_string[pmu->pmu_name.offset],
+        };
+        int low = 0, high = pmu->num_entries - 1;
+
+        while (low <= high) {
+                int cmp, mid = (low + high) / 2;
+
+                decompress_metric(pmu->entries[mid].offset, &pm);
+
+                if (!pm.metric_name && !metric)
+                        goto do_call;
+
+                if (!pm.metric_name && metric) {
+                        low = mid + 1;
+                        continue;
+                }
+                if (pm.metric_name && !metric) {
+                        high = mid - 1;
+                        continue;
+                }
+
+                cmp = strcmp(pm.metric_name, metric);
+                if (cmp < 0) {
+                        low = mid + 1;
+                        continue;
+                }
+                if (cmp > 0) {
+                        high = mid - 1;
+                        continue;
+                }
+  do_call:
+                return fn ? fn(&pm, table, data) : 0;
+        }
+        return PMU_METRICS__NOT_FOUND;
+}
+
 int pmu_metrics_table__for_each_metric(const struct pmu_metrics_table *table,
                                      pmu_metric_iter_fn fn,
                                      void *data)
@@ -503,6 +546,27 @@ int pmu_metrics_table__for_each_metric(const struct pmu_metrics_table *table,
         return 0;
 }
 
+int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
+                                 struct perf_pmu *pmu,
+                                 const char *metric,
+                                 pmu_metric_iter_fn fn,
+                                 void *data)
+{
+        for (size_t i = 0; i < table->num_pmus; i++) {
+                const struct pmu_table_entry *table_pmu = &table->pmus[i];
+                const char *pmu_name = &big_c_string[table_pmu->pmu_name.offset];
+                int ret;
+
+                if (pmu && !perf_pmu__name_wildcard_match(pmu, pmu_name))
+                        continue;
+
+                ret = pmu_metrics_table__find_metric_pmu(table, table_pmu, metric, fn, data);
+                if (ret != PMU_METRICS__NOT_FOUND)
+                        return ret;
+        }
+        return PMU_METRICS__NOT_FOUND;
+}
+
 static const struct pmu_events_map *map_for_cpu(struct perf_cpu cpu)
 {
         static struct {
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 7499a35bfadd..ec10d767fb49 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -966,7 +966,7 @@ int pmu_events_table__find_event(const struct pmu_events_table *table,
                 const char *pmu_name = &big_c_string[table_pmu->pmu_name.offset];
                 int ret;
 
-                if (!perf_pmu__name_wildcard_match(pmu, pmu_name))
+                if (pmu && !perf_pmu__name_wildcard_match(pmu, pmu_name))
                         continue;
 
                 ret = pmu_events_table__find_event_pmu(table, table_pmu, name, fn, data);
@@ -1012,6 +1012,49 @@ static int pmu_metrics_table__for_each_metric_pmu(const struct pmu_metrics_table
         return 0;
 }
 
+static int pmu_metrics_table__find_metric_pmu(const struct pmu_metrics_table *table,
+                                            const struct pmu_table_entry *pmu,
+                                            const char *metric,
+                                            pmu_metric_iter_fn fn,
+                                            void *data)
+{
+        struct pmu_metric pm = {
+                .pmu = &big_c_string[pmu->pmu_name.offset],
+        };
+        int low = 0, high = pmu->num_entries - 1;
+
+        while (low <= high) {
+                int cmp, mid = (low + high) / 2;
+
+                decompress_metric(pmu->entries[mid].offset, &pm);
+
+                if (!pm.metric_name && !metric)
+                        goto do_call;
+
+                if (!pm.metric_name && metric) {
+                        low = mid + 1;
+                        continue;
+                }
+                if (pm.metric_name && !metric) {
+                        high = mid - 1;
+                        continue;
+                }
+
+                cmp = strcmp(pm.metric_name, metric);
+                if (cmp < 0) {
+                        low = mid + 1;
+                        continue;
+                }
+                if (cmp > 0) {
+                        high = mid - 1;
+                        continue;
+                }
+  do_call:
+                return fn ? fn(&pm, table, data) : 0;
+        }
+        return PMU_METRICS__NOT_FOUND;
+}
+
 int pmu_metrics_table__for_each_metric(const struct pmu_metrics_table *table,
                                      pmu_metric_iter_fn fn,
                                      void *data)
@@ -1026,6 +1069,27 @@ int pmu_metrics_table__for_each_metric(const struct pmu_metrics_table *table,
         return 0;
 }
 
+int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
+                                 struct perf_pmu *pmu,
+                                 const char *metric,
+                                 pmu_metric_iter_fn fn,
+                                 void *data)
+{
+        for (size_t i = 0; i < table->num_pmus; i++) {
+                const struct pmu_table_entry *table_pmu = &table->pmus[i];
+                const char *pmu_name = &big_c_string[table_pmu->pmu_name.offset];
+                int ret;
+
+                if (pmu && !perf_pmu__name_wildcard_match(pmu, pmu_name))
+                        continue;
+
+                ret = pmu_metrics_table__find_metric_pmu(table, table_pmu, metric, fn, data);
+                if (ret != PMU_METRICS__NOT_FOUND)
+                        return ret;
+        }
+        return PMU_METRICS__NOT_FOUND;
+}
+
 static const struct pmu_events_map *map_for_cpu(struct perf_cpu cpu)
 {
         static struct {
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
index 675562e6f770..8048ca0b2fb3 100644
--- a/tools/perf/pmu-events/pmu-events.h
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -71,6 +71,7 @@ struct pmu_events_table;
 struct pmu_metrics_table;
 
 #define PMU_EVENTS__NOT_FOUND -1000
+#define PMU_METRICS__NOT_FOUND -1000
 
 typedef int (*pmu_event_iter_fn)(const struct pmu_event *pe,
 				 const struct pmu_events_table *table,
@@ -85,11 +86,11 @@ int pmu_events_table__for_each_event(const struct pmu_events_table *table,
 				    pmu_event_iter_fn fn,
 				    void *data);
 /*
- * Search for table and entry matching with pmu__name_match. Each matching event
- * has fn called on it. 0 implies to success/continue the search while non-zero
- * means to terminate. The special value PMU_EVENTS__NOT_FOUND is used to
- * indicate no event was found in one of the tables which doesn't terminate the
- * search of all tables.
+ * Search for a table and entry matching with pmu__name_wildcard_match or any
+ * tables if pmu is NULL. Each matching event has fn called on it. 0 implies to
+ * success/continue the search while non-zero means to terminate. The special
+ * value PMU_EVENTS__NOT_FOUND is used to indicate no event was found in one of
+ * the tables which doesn't terminate the search of all tables.
  */
 int pmu_events_table__find_event(const struct pmu_events_table *table,
                                  struct perf_pmu *pmu,
@@ -101,6 +102,18 @@ size_t pmu_events_table__num_events(const struct pmu_events_table *table,
 
 int pmu_metrics_table__for_each_metric(const struct pmu_metrics_table *table, pmu_metric_iter_fn fn,
 				     void *data);
+/*
+ * Search for a table and entry matching with pmu__name_wildcard_match or any
+ * tables if pmu is NULL. Each matching metric has fn called on it. 0 implies to
+ * success/continue the search while non-zero means to terminate. The special
+ * value PMU_METRICS__NOT_FOUND is used to indicate no metric was found in one
+ * of the tables which doesn't terminate the search of all tables.
+ */
+int pmu_metrics_table__find_metric(const struct pmu_metrics_table *table,
+				   struct perf_pmu *pmu,
+				   const char *metric,
+				   pmu_metric_iter_fn fn,
+				   void *data);
 
 const struct pmu_events_table *perf_pmu__find_events_table(struct perf_pmu *pmu);
 const struct pmu_metrics_table *pmu_metrics_table__find(void);
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 46920ebadfd1..126a631686b0 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -353,7 +353,7 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
 	return 0;
 }
 
-static bool match_metric(const char *metric_or_groups, const char *sought)
+static bool match_metric_or_groups(const char *metric_or_groups, const char *sought)
 {
 	int len;
 	char *m;
@@ -369,18 +369,19 @@ static bool match_metric(const char *metric_or_groups, const char *sought)
 	    (metric_or_groups[len] == 0 || metric_or_groups[len] == ';'))
 		return true;
 	m = strchr(metric_or_groups, ';');
-	return m && match_metric(m + 1, sought);
+	return m && match_metric_or_groups(m + 1, sought);
 }
 
-static bool match_pm_metric(const struct pmu_metric *pm, const char *pmu, const char *metric)
+static bool match_pm_metric_or_groups(const struct pmu_metric *pm, const char *pmu,
+				      const char *metric_or_groups)
 {
 	const char *pm_pmu = pm->pmu ?: "cpu";
 
 	if (strcmp(pmu, "all") && strcmp(pm_pmu, pmu))
 		return false;
 
-	return match_metric(pm->metric_group, metric) ||
-	       match_metric(pm->metric_name, metric);
+	return match_metric_or_groups(pm->metric_group, metric_or_groups) ||
+	       match_metric_or_groups(pm->metric_name, metric_or_groups);
 }
 
 /** struct mep - RB-tree node for building printing information. */
@@ -802,11 +803,6 @@ struct metricgroup_add_iter_data {
 	const struct pmu_metrics_table *table;
 };
 
-static bool metricgroup__find_metric(const char *pmu,
-				     const char *metric,
-				     const struct pmu_metrics_table *table,
-				     struct pmu_metric *pm);
-
 static int add_metric(struct list_head *metric_list,
 		      const struct pmu_metric *pm,
 		      const char *modifier,
@@ -818,6 +814,16 @@ static int add_metric(struct list_head *metric_list,
 		      const struct visited_metric *visited,
 		      const struct pmu_metrics_table *table);
 
+static int metricgroup__find_metric_callback(const struct pmu_metric *pm,
+					     const struct pmu_metrics_table *table  __maybe_unused,
+					     void *vdata)
+{
+	struct pmu_metric *copied_pm = vdata;
+
+	memcpy(copied_pm, pm, sizeof(*pm));
+	return 0;
+}
+
 /**
  * resolve_metric - Locate metrics within the root metric and recursively add
  *                    references to them.
@@ -838,7 +844,7 @@ static int add_metric(struct list_head *metric_list,
  *       architecture perf is running upon.
  */
 static int resolve_metric(struct list_head *metric_list,
-			  const char *pmu,
+			  struct perf_pmu *pmu,
 			  const char *modifier,
 			  bool metric_no_group,
 			  bool metric_no_threshold,
@@ -868,7 +874,9 @@ static int resolve_metric(struct list_head *metric_list,
 	hashmap__for_each_entry(root_metric->pctx->ids, cur, bkt) {
 		struct pmu_metric pm;
 
-		if (metricgroup__find_metric(pmu, cur->pkey, table, &pm)) {
+		if (pmu_metrics_table__find_metric(table, pmu, cur->pkey,
+						   metricgroup__find_metric_callback,
+						   &pm) != PMU_METRICS__NOT_FOUND) {
 			pending = realloc(pending,
 					(pending_cnt + 1) * sizeof(struct to_resolve));
 			if (!pending)
@@ -1019,7 +1027,12 @@ static int __add_metric(struct list_head *metric_list,
 	}
 	if (!ret) {
 		/* Resolve referenced metrics. */
-		const char *pmu = pm->pmu ?: "cpu";
+		struct perf_pmu *pmu;
+
+		if (pm->pmu && pm->pmu[0] != '\0')
+			pmu = perf_pmus__find(pm->pmu);
+		else
+			pmu = perf_pmus__scan_core(/*pmu=*/ NULL);
 
 		ret = resolve_metric(metric_list, pmu, modifier, metric_no_group,
 				     metric_no_threshold, user_requested_cpu_list,
@@ -1036,44 +1049,6 @@ static int __add_metric(struct list_head *metric_list,
 	return ret;
 }
 
-struct metricgroup__find_metric_data {
-	const char *pmu;
-	const char *metric;
-	struct pmu_metric *pm;
-};
-
-static int metricgroup__find_metric_callback(const struct pmu_metric *pm,
-					     const struct pmu_metrics_table *table  __maybe_unused,
-					     void *vdata)
-{
-	struct metricgroup__find_metric_data *data = vdata;
-	const char *pm_pmu = pm->pmu ?: "cpu";
-
-	if (strcmp(data->pmu, "all") && strcmp(pm_pmu, data->pmu))
-		return 0;
-
-	if (!match_metric(pm->metric_name, data->metric))
-		return 0;
-
-	memcpy(data->pm, pm, sizeof(*pm));
-	return 1;
-}
-
-static bool metricgroup__find_metric(const char *pmu,
-				     const char *metric,
-				     const struct pmu_metrics_table *table,
-				     struct pmu_metric *pm)
-{
-	struct metricgroup__find_metric_data data = {
-		.pmu = pmu,
-		.metric = metric,
-		.pm = pm,
-	};
-
-	return pmu_metrics_table__for_each_metric(table, metricgroup__find_metric_callback, &data)
-		? true : false;
-}
-
 static int add_metric(struct list_head *metric_list,
 		      const struct pmu_metric *pm,
 		      const char *modifier,
@@ -1119,7 +1094,7 @@ static int metricgroup__add_metric_sys_event_iter(const struct pmu_metric *pm,
 	struct metricgroup_add_iter_data *d = data;
 	int ret;
 
-	if (!match_pm_metric(pm, d->pmu, d->metric_name))
+	if (!match_pm_metric_or_groups(pm, d->pmu, d->metric_name))
 		return 0;
 
 	ret = add_metric(d->metric_list, pm, d->modifier, d->metric_no_group,
@@ -1200,9 +1175,9 @@ static int metricgroup__add_metric_callback(const struct pmu_metric *pm,
 	struct metricgroup__add_metric_data *data = vdata;
 	int ret = 0;
 
-	if (pm->metric_expr && match_pm_metric(pm, data->pmu, data->metric_name)) {
+	if (pm->metric_expr && match_pm_metric_or_groups(pm, data->pmu, data->metric_name)) {
 		bool metric_no_group = data->metric_no_group ||
-			match_metric(pm->metricgroup_no_group, data->metric_name);
+			match_metric_or_groups(pm->metricgroup_no_group, data->metric_name);
 
 		data->has_match = true;
 		ret = add_metric(data->list, pm, data->modifier, metric_no_group,
@@ -1723,29 +1698,32 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
 
 struct metricgroup__has_metric_data {
 	const char *pmu;
-	const char *metric;
+	const char *metric_or_groups;
 };
-static int metricgroup__has_metric_callback(const struct pmu_metric *pm,
-					    const struct pmu_metrics_table *table __maybe_unused,
-					    void *vdata)
+static int metricgroup__has_metric_or_groups_callback(const struct pmu_metric *pm,
+						      const struct pmu_metrics_table *table
+							__maybe_unused,
+						      void *vdata)
 {
 	struct metricgroup__has_metric_data *data = vdata;
 
-	return match_pm_metric(pm, data->pmu, data->metric) ? 1 : 0;
+	return match_pm_metric_or_groups(pm, data->pmu, data->metric_or_groups) ? 1 : 0;
 }
 
-bool metricgroup__has_metric(const char *pmu, const char *metric)
+bool metricgroup__has_metric_or_groups(const char *pmu, const char *metric_or_groups)
 {
 	const struct pmu_metrics_table *table = pmu_metrics_table__find();
 	struct metricgroup__has_metric_data data = {
 		.pmu = pmu,
-		.metric = metric,
+		.metric_or_groups = metric_or_groups,
 	};
 
 	if (!table)
 		return false;
 
-	return pmu_metrics_table__for_each_metric(table, metricgroup__has_metric_callback, &data)
+	return pmu_metrics_table__for_each_metric(table,
+						  metricgroup__has_metric_or_groups_callback,
+						  &data)
 		? true : false;
 }
 
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 779f6ede1b51..a04ac1afa6cc 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -85,7 +85,7 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
 				   struct rblist *metric_events);
 
 void metricgroup__print(const struct print_callbacks *print_cb, void *print_state);
-bool metricgroup__has_metric(const char *pmu, const char *metric);
+bool metricgroup__has_metric_or_groups(const char *pmu, const char *metric_or_groups);
 unsigned int metricgroups__topdown_max_level(void);
 int arch_get_runtimeparam(const struct pmu_metric *pm);
 void metricgroup__rblist_exit(struct rblist *metric_events);
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-04-10  4:45 [PATCH v1 0/3] Metric related performance improvements Ian Rogers
                   ` (2 preceding siblings ...)
  2025-04-10  4:45 ` [PATCH v1 3/3] perf metricgroup: Binary search when resolving referred to metrics Ian Rogers
@ 2025-04-10  6:49 ` Namhyung Kim
  2025-04-23 20:48   ` Ian Rogers
  2025-05-13 19:34   ` Arnaldo Carvalho de Melo
  3 siblings, 2 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-04-10  6:49 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

Hi Ian,

On Wed, Apr 09, 2025 at 09:45:29PM -0700, Ian Rogers wrote:
> The "PMU JSON event tests" have been running slowly, these changes
> target improving them with an improvement of the test running 8 to 10
> times faster.
> 
> The first patch changes from searching through all aliases by name in
> a list to using a hashmap. Doing a fast hashmap__find means testing
> for having an event needn't load from disk if an event is already
> present.
> 
> The second patch switch the fncache to use a hashmap rather than its
> own hashmap with a limited number of buckets. When there are many
> filename queries, such as with a test, there are many collisions with
> the previous fncache approach leading to linear searching of the
> entries.
> 
> The final patch adds a find function for metrics. Normally metrics can
> match by name and group, however, only name matching happens when one
> metric refers to another. As we test every "id" in a metric to see if
> it is a metric, the find function can dominate performance as it
> linearly searches all metrics. Add a find function for the metrics
> table so that a metric can be found by name with a binary search.
> 
> Before these changes:
> ```
> $ time perf test -v 10
>  10: PMU JSON event tests                                            :
>  10.1: PMU event table sanity                                        : Ok
>  10.2: PMU event map aliases                                         : Ok
>  10.3: Parsing of PMU event table metrics                            : Ok
>  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
>  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> 
> real    0m18.499s
> user    0m18.150s
> sys     0m3.273s
> ```
> 
> After these changes:
> ```
> $ time perf test -v 10
>  10: PMU JSON event tests                                            :
>  10.1: PMU event table sanity                                        : Ok
>  10.2: PMU event map aliases                                         : Ok
>  10.3: Parsing of PMU event table metrics                            : Ok
>  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
>  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> 
> real    0m2.338s
> user    0m1.797s
> sys     0m2.186s
> ```

Great, I also see the speedup on my machine from 32s to 3s.

Tested-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

> 
> Ian Rogers (3):
>   perf pmu: Change aliases from list to hashmap
>   perf fncache: Switch to using hashmap
>   perf metricgroup: Binary search when resolving referred to metrics
> 
>  tools/perf/builtin-stat.c                |   6 +-
>  tools/perf/pmu-events/empty-pmu-events.c |  66 ++++++++-
>  tools/perf/pmu-events/jevents.py         |  66 ++++++++-
>  tools/perf/pmu-events/pmu-events.h       |  23 +++-
>  tools/perf/tests/pmu-events.c            | 129 +++++++++--------
>  tools/perf/util/fncache.c                |  69 +++++-----
>  tools/perf/util/fncache.h                |   1 -
>  tools/perf/util/hwmon_pmu.c              |  43 +++---
>  tools/perf/util/metricgroup.c            | 102 ++++++--------
>  tools/perf/util/metricgroup.h            |   2 +-
>  tools/perf/util/pmu.c                    | 167 +++++++++++++++--------
>  tools/perf/util/pmu.h                    |   4 +-
>  tools/perf/util/srccode.c                |   4 +-
>  tools/perf/util/tool_pmu.c               |  17 +--
>  14 files changed, 430 insertions(+), 269 deletions(-)
> 
> -- 
> 2.49.0.504.g3bcea36a83-goog
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-04-10  6:49 ` [PATCH v1 0/3] Metric related performance improvements Namhyung Kim
@ 2025-04-23 20:48   ` Ian Rogers
  2025-05-12 16:40     ` Arnaldo Carvalho de Melo
  2025-05-13 19:34   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2025-04-23 20:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

On Wed, Apr 9, 2025 at 11:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Wed, Apr 09, 2025 at 09:45:29PM -0700, Ian Rogers wrote:
> > The "PMU JSON event tests" have been running slowly, these changes
> > target improving them with an improvement of the test running 8 to 10
> > times faster.
> >
> > The first patch changes from searching through all aliases by name in
> > a list to using a hashmap. Doing a fast hashmap__find means testing
> > for having an event needn't load from disk if an event is already
> > present.
> >
> > The second patch switch the fncache to use a hashmap rather than its
> > own hashmap with a limited number of buckets. When there are many
> > filename queries, such as with a test, there are many collisions with
> > the previous fncache approach leading to linear searching of the
> > entries.
> >
> > The final patch adds a find function for metrics. Normally metrics can
> > match by name and group, however, only name matching happens when one
> > metric refers to another. As we test every "id" in a metric to see if
> > it is a metric, the find function can dominate performance as it
> > linearly searches all metrics. Add a find function for the metrics
> > table so that a metric can be found by name with a binary search.
> >
> > Before these changes:
> > ```
> > $ time perf test -v 10
> >  10: PMU JSON event tests                                            :
> >  10.1: PMU event table sanity                                        : Ok
> >  10.2: PMU event map aliases                                         : Ok
> >  10.3: Parsing of PMU event table metrics                            : Ok
> >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> >
> > real    0m18.499s
> > user    0m18.150s
> > sys     0m3.273s
> > ```
> >
> > After these changes:
> > ```
> > $ time perf test -v 10
> >  10: PMU JSON event tests                                            :
> >  10.1: PMU event table sanity                                        : Ok
> >  10.2: PMU event map aliases                                         : Ok
> >  10.3: Parsing of PMU event table metrics                            : Ok
> >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> >
> > real    0m2.338s
> > user    0m1.797s
> > sys     0m2.186s
> > ```
>
> Great, I also see the speedup on my machine from 32s to 3s.
>
> Tested-by: Namhyung Kim <namhyung@kernel.org>

Ping.

Thanks,
Ian

> Thanks,
> Namhyung
>
> >
> > Ian Rogers (3):
> >   perf pmu: Change aliases from list to hashmap
> >   perf fncache: Switch to using hashmap
> >   perf metricgroup: Binary search when resolving referred to metrics
> >
> >  tools/perf/builtin-stat.c                |   6 +-
> >  tools/perf/pmu-events/empty-pmu-events.c |  66 ++++++++-
> >  tools/perf/pmu-events/jevents.py         |  66 ++++++++-
> >  tools/perf/pmu-events/pmu-events.h       |  23 +++-
> >  tools/perf/tests/pmu-events.c            | 129 +++++++++--------
> >  tools/perf/util/fncache.c                |  69 +++++-----
> >  tools/perf/util/fncache.h                |   1 -
> >  tools/perf/util/hwmon_pmu.c              |  43 +++---
> >  tools/perf/util/metricgroup.c            | 102 ++++++--------
> >  tools/perf/util/metricgroup.h            |   2 +-
> >  tools/perf/util/pmu.c                    | 167 +++++++++++++++--------
> >  tools/perf/util/pmu.h                    |   4 +-
> >  tools/perf/util/srccode.c                |   4 +-
> >  tools/perf/util/tool_pmu.c               |  17 +--
> >  14 files changed, 430 insertions(+), 269 deletions(-)
> >
> > --
> > 2.49.0.504.g3bcea36a83-goog
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-04-23 20:48   ` Ian Rogers
@ 2025-05-12 16:40     ` Arnaldo Carvalho de Melo
  2025-05-12 16:57       ` Ian Rogers
  0 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-12 16:40 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

On Wed, Apr 23, 2025 at 01:48:22PM -0700, Ian Rogers wrote:
> On Wed, Apr 9, 2025 at 11:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Wed, Apr 09, 2025 at 09:45:29PM -0700, Ian Rogers wrote:
> > > The "PMU JSON event tests" have been running slowly, these changes
> > > target improving them with an improvement of the test running 8 to 10
> > > times faster.
> > >
> > > The first patch changes from searching through all aliases by name in
> > > a list to using a hashmap. Doing a fast hashmap__find means testing
> > > for having an event needn't load from disk if an event is already
> > > present.
> > >
> > > The second patch switch the fncache to use a hashmap rather than its
> > > own hashmap with a limited number of buckets. When there are many
> > > filename queries, such as with a test, there are many collisions with
> > > the previous fncache approach leading to linear searching of the
> > > entries.
> > >
> > > The final patch adds a find function for metrics. Normally metrics can
> > > match by name and group, however, only name matching happens when one
> > > metric refers to another. As we test every "id" in a metric to see if
> > > it is a metric, the find function can dominate performance as it
> > > linearly searches all metrics. Add a find function for the metrics
> > > table so that a metric can be found by name with a binary search.
> > >
> > > Before these changes:
> > > ```
> > > $ time perf test -v 10
> > >  10: PMU JSON event tests                                            :
> > >  10.1: PMU event table sanity                                        : Ok
> > >  10.2: PMU event map aliases                                         : Ok
> > >  10.3: Parsing of PMU event table metrics                            : Ok
> > >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> > >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> > >
> > > real    0m18.499s
> > > user    0m18.150s
> > > sys     0m3.273s
> > > ```
> > >
> > > After these changes:
> > > ```
> > > $ time perf test -v 10
> > >  10: PMU JSON event tests                                            :
> > >  10.1: PMU event table sanity                                        : Ok
> > >  10.2: PMU event map aliases                                         : Ok
> > >  10.3: Parsing of PMU event table metrics                            : Ok
> > >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> > >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> > >
> > > real    0m2.338s
> > > user    0m1.797s
> > > sys     0m2.186s
> > > ```
> >
> > Great, I also see the speedup on my machine from 32s to 3s.
> >
> > Tested-by: Namhyung Kim <namhyung@kernel.org>
> 
> Ping.

I'll try to fix up it later, if you don't beat me to it, will continue
with the other patches you listed to get the ones that applies merged:

Total patches: 3
---
Cover: ./20250409_irogers_metric_related_performance_improvements.cover
 Link: https://lore.kernel.org/r/20250410044532.52017-1-irogers@google.com
 Base: not specified
       git am ./20250409_irogers_metric_related_performance_improvements.mbx
⬢ [acme@toolbx perf-tools-next]$        git am ./20250409_irogers_metric_related_performance_improvements.mbx
Applying: perf pmu: Change aliases from list to hashmap
error: patch failed: tools/perf/util/pmu.c:532
error: tools/perf/util/pmu.c: patch does not apply
Patch failed at 0001 perf pmu: Change aliases from list to hashmap
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
⬢ [acme@toolbx perf-tools-next]$ 
⬢ [acme@toolbx perf-tools-next]$ git am --abort
⬢ [acme@toolbx perf-tools-next]$ patch -p1 < ./20250409_irogers_metric_related_performance_improvements.mbx
patching file tools/perf/tests/pmu-events.c
patching file tools/perf/util/hwmon_pmu.c
patching file tools/perf/util/pmu.c
Hunk #3 succeeded at 417 (offset 11 lines).
Hunk #4 succeeded at 451 (offset 11 lines).
Hunk #5 FAILED at 541.
Hunk #6 succeeded at 657 (offset 41 lines).
Hunk #7 succeeded at 1146 (offset 41 lines).
Hunk #8 succeeded at 1238 (offset 41 lines).
Hunk #9 succeeded at 1259 (offset 41 lines).
Hunk #10 succeeded at 2018 (offset 48 lines).
Hunk #11 succeeded at 2033 (offset 48 lines).
Hunk #12 succeeded at 2502 (offset 59 lines).
Hunk #13 succeeded at 2522 (offset 59 lines).
1 out of 13 hunks FAILED -- saving rejects to file tools/perf/util/pmu.c.rej
patching file tools/perf/util/pmu.h
Hunk #3 succeeded at 295 (offset 5 lines).
patching file tools/perf/util/tool_pmu.c
Hunk #1 succeeded at 502 (offset 6 lines).
patching file tools/perf/util/fncache.c
patching file tools/perf/util/fncache.h
patching file tools/perf/util/srccode.c
patching file tools/perf/builtin-stat.c
Hunk #1 succeeded at 1854 (offset -2 lines).
Hunk #2 succeeded at 1888 (offset -2 lines).
Hunk #3 succeeded at 1978 (offset -2 lines).
patching file tools/perf/pmu-events/empty-pmu-events.c
Hunk #1 succeeded at 449 (offset 6 lines).
Hunk #2 succeeded at 495 (offset 6 lines).
Hunk #3 succeeded at 552 (offset 6 lines).
patching file tools/perf/pmu-events/jevents.py
Hunk #1 succeeded at 972 (offset 6 lines).
Hunk #2 succeeded at 1018 (offset 6 lines).
Hunk #3 succeeded at 1075 (offset 6 lines).
patching file tools/perf/pmu-events/pmu-events.h
Hunk #1 succeeded at 74 (offset 3 lines).
Hunk #2 succeeded at 89 (offset 3 lines).
Hunk #3 succeeded at 105 (offset 3 lines).
patching file tools/perf/util/metricgroup.c
patching file tools/perf/util/metricgroup.h
⬢ [acme@toolbx perf-tools-next]$

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-05-12 16:40     ` Arnaldo Carvalho de Melo
@ 2025-05-12 16:57       ` Ian Rogers
  2025-05-12 17:29         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2025-05-12 16:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

On Mon, May 12, 2025 at 9:40 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Wed, Apr 23, 2025 at 01:48:22PM -0700, Ian Rogers wrote:
> > On Wed, Apr 9, 2025 at 11:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Wed, Apr 09, 2025 at 09:45:29PM -0700, Ian Rogers wrote:
> > > > The "PMU JSON event tests" have been running slowly, these changes
> > > > target improving them with an improvement of the test running 8 to 10
> > > > times faster.
> > > >
> > > > The first patch changes from searching through all aliases by name in
> > > > a list to using a hashmap. Doing a fast hashmap__find means testing
> > > > for having an event needn't load from disk if an event is already
> > > > present.
> > > >
> > > > The second patch switch the fncache to use a hashmap rather than its
> > > > own hashmap with a limited number of buckets. When there are many
> > > > filename queries, such as with a test, there are many collisions with
> > > > the previous fncache approach leading to linear searching of the
> > > > entries.
> > > >
> > > > The final patch adds a find function for metrics. Normally metrics can
> > > > match by name and group, however, only name matching happens when one
> > > > metric refers to another. As we test every "id" in a metric to see if
> > > > it is a metric, the find function can dominate performance as it
> > > > linearly searches all metrics. Add a find function for the metrics
> > > > table so that a metric can be found by name with a binary search.
> > > >
> > > > Before these changes:
> > > > ```
> > > > $ time perf test -v 10
> > > >  10: PMU JSON event tests                                            :
> > > >  10.1: PMU event table sanity                                        : Ok
> > > >  10.2: PMU event map aliases                                         : Ok
> > > >  10.3: Parsing of PMU event table metrics                            : Ok
> > > >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> > > >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> > > >
> > > > real    0m18.499s
> > > > user    0m18.150s
> > > > sys     0m3.273s
> > > > ```
> > > >
> > > > After these changes:
> > > > ```
> > > > $ time perf test -v 10
> > > >  10: PMU JSON event tests                                            :
> > > >  10.1: PMU event table sanity                                        : Ok
> > > >  10.2: PMU event map aliases                                         : Ok
> > > >  10.3: Parsing of PMU event table metrics                            : Ok
> > > >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> > > >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> > > >
> > > > real    0m2.338s
> > > > user    0m1.797s
> > > > sys     0m2.186s
> > > > ```
> > >
> > > Great, I also see the speedup on my machine from 32s to 3s.
> > >
> > > Tested-by: Namhyung Kim <namhyung@kernel.org>
> >
> > Ping.
>
> I'll try to fix up it later, if you don't beat me to it, will continue
> with the other patches you listed to get the ones that applies merged:
>
> Total patches: 3
> ---
> Cover: ./20250409_irogers_metric_related_performance_improvements.cover
>  Link: https://lore.kernel.org/r/20250410044532.52017-1-irogers@google.com
>  Base: not specified
>        git am ./20250409_irogers_metric_related_performance_improvements.mbx
> ⬢ [acme@toolbx perf-tools-next]$        git am ./20250409_irogers_metric_related_performance_improvements.mbx
> Applying: perf pmu: Change aliases from list to hashmap
> error: patch failed: tools/perf/util/pmu.c:532
> error: tools/perf/util/pmu.c: patch does not apply
> Patch failed at 0001 perf pmu: Change aliases from list to hashmap
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> hint: When you have resolved this problem, run "git am --continue".
> hint: If you prefer to skip this patch, run "git am --skip" instead.
> hint: To restore the original branch and stop patching, run "git am --abort".
> hint: Disable this message with "git config set advice.mergeConflict false"
> ⬢ [acme@toolbx perf-tools-next]$
> ⬢ [acme@toolbx perf-tools-next]$ git am --abort
> ⬢ [acme@toolbx perf-tools-next]$ patch -p1 < ./20250409_irogers_metric_related_performance_improvements.mbx
> patching file tools/perf/tests/pmu-events.c
> patching file tools/perf/util/hwmon_pmu.c
> patching file tools/perf/util/pmu.c
> Hunk #3 succeeded at 417 (offset 11 lines).
> Hunk #4 succeeded at 451 (offset 11 lines).
> Hunk #5 FAILED at 541.
> Hunk #6 succeeded at 657 (offset 41 lines).
> Hunk #7 succeeded at 1146 (offset 41 lines).
> Hunk #8 succeeded at 1238 (offset 41 lines).
> Hunk #9 succeeded at 1259 (offset 41 lines).
> Hunk #10 succeeded at 2018 (offset 48 lines).
> Hunk #11 succeeded at 2033 (offset 48 lines).
> Hunk #12 succeeded at 2502 (offset 59 lines).
> Hunk #13 succeeded at 2522 (offset 59 lines).
> 1 out of 13 hunks FAILED -- saving rejects to file tools/perf/util/pmu.c.rej
> patching file tools/perf/util/pmu.h
> Hunk #3 succeeded at 295 (offset 5 lines).
> patching file tools/perf/util/tool_pmu.c
> Hunk #1 succeeded at 502 (offset 6 lines).
> patching file tools/perf/util/fncache.c
> patching file tools/perf/util/fncache.h
> patching file tools/perf/util/srccode.c
> patching file tools/perf/builtin-stat.c
> Hunk #1 succeeded at 1854 (offset -2 lines).
> Hunk #2 succeeded at 1888 (offset -2 lines).
> Hunk #3 succeeded at 1978 (offset -2 lines).
> patching file tools/perf/pmu-events/empty-pmu-events.c
> Hunk #1 succeeded at 449 (offset 6 lines).
> Hunk #2 succeeded at 495 (offset 6 lines).
> Hunk #3 succeeded at 552 (offset 6 lines).
> patching file tools/perf/pmu-events/jevents.py
> Hunk #1 succeeded at 972 (offset 6 lines).
> Hunk #2 succeeded at 1018 (offset 6 lines).
> Hunk #3 succeeded at 1075 (offset 6 lines).
> patching file tools/perf/pmu-events/pmu-events.h
> Hunk #1 succeeded at 74 (offset 3 lines).
> Hunk #2 succeeded at 89 (offset 3 lines).
> Hunk #3 succeeded at 105 (offset 3 lines).
> patching file tools/perf/util/metricgroup.c
> patching file tools/perf/util/metricgroup.h
> ⬢ [acme@toolbx perf-tools-next]$

Thanks Arnaldo! Happy to send a rebase on tmp.perf-tools-next if useful.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-05-12 16:57       ` Ian Rogers
@ 2025-05-12 17:29         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-12 17:29 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

On Mon, May 12, 2025 at 09:57:45AM -0700, Ian Rogers wrote:
> On Mon, May 12, 2025 at 9:40 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Hunk #4 succeeded at 451 (offset 11 lines).
> > Hunk #5 FAILED at 541.
> > Hunk #6 succeeded at 657 (offset 41 lines).
> > ⬢ [acme@toolbx perf-tools-next]$
 
> Thanks Arnaldo! Happy to send a rebase on tmp.perf-tools-next if useful.

Sure, I just pushed what I have:

⬢ [acme@toolbx perf-tools-next]$ git log --oneline -10
255f5b6d060be5a4 (HEAD -> perf-tools-next, x1/perf-tools-next, x1/HEAD, perf-tools-next/tmp.perf-tools-next, five/perf-tools-next, five/HEAD) perf parse-events: Add "cpu" term to set the CPU an event is recorded on
168c7b509109fe26 perf parse-events: Set is_pmu_core for legacy hardware events
f60c3f44689ac2bc perf stat: Use counter cpumask to skip zero values
2e7a2f7f3c6e3a99 libperf cpumap: Add ability to create CPU from a single CPU number
365e02ddb65d443f perf tests metrics: Permission related fixes
f0869f31562bde2e perf evsel: Add per-thread warning for EOPNOTSUPP open failues
17e548405a81665f perf scripts python: exported-sql-viewer.py: Fix pattern matching with Python 3
352b088164b5cde1 perf intel-pt: Do not default to recording all switch events
e00eac6b5b6d956f perf intel-pt: Fix PEBS-via-PT data_src
cd17a9b1a779459d (perf-tools-next/perf-tools-next) perf test demangle-ocaml: Switch to using dso__demangle_sym()
⬢ [acme@toolbx perf-tools-next]$

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-04-10  6:49 ` [PATCH v1 0/3] Metric related performance improvements Namhyung Kim
  2025-04-23 20:48   ` Ian Rogers
@ 2025-05-13 19:34   ` Arnaldo Carvalho de Melo
  2025-05-13 19:35     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-13 19:34 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

On Wed, Apr 09, 2025 at 11:49:16PM -0700, Namhyung Kim wrote:
> Hi Ian,
> 
> On Wed, Apr 09, 2025 at 09:45:29PM -0700, Ian Rogers wrote:
> > The "PMU JSON event tests" have been running slowly, these changes
> > target improving them with an improvement of the test running 8 to 10
> > times faster.
> > 
> > The first patch changes from searching through all aliases by name in
> > a list to using a hashmap. Doing a fast hashmap__find means testing
> > for having an event needn't load from disk if an event is already
> > present.
> > 
> > The second patch switch the fncache to use a hashmap rather than its
> > own hashmap with a limited number of buckets. When there are many
> > filename queries, such as with a test, there are many collisions with
> > the previous fncache approach leading to linear searching of the
> > entries.
> > 
> > The final patch adds a find function for metrics. Normally metrics can
> > match by name and group, however, only name matching happens when one
> > metric refers to another. As we test every "id" in a metric to see if
> > it is a metric, the find function can dominate performance as it
> > linearly searches all metrics. Add a find function for the metrics
> > table so that a metric can be found by name with a binary search.
> > 
> > Before these changes:
> > ```
> > $ time perf test -v 10
> >  10: PMU JSON event tests                                            :
> >  10.1: PMU event table sanity                                        : Ok
> >  10.2: PMU event map aliases                                         : Ok
> >  10.3: Parsing of PMU event table metrics                            : Ok
> >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> > 
> > real    0m18.499s
> > user    0m18.150s
> > sys     0m3.273s
> > ```
> > 
> > After these changes:
> > ```
> > $ time perf test -v 10
> >  10: PMU JSON event tests                                            :
> >  10.1: PMU event table sanity                                        : Ok
> >  10.2: PMU event map aliases                                         : Ok
> >  10.3: Parsing of PMU event table metrics                            : Ok
> >  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> >  10.5: Parsing of metric thresholds with fake PMUs                   : Ok
> > 
> > real    0m2.338s
> > user    0m1.797s
> > sys     0m2.186s
> > ```
> 
> Great, I also see the speedup on my machine from 32s to 3s.
> 
> Tested-by: Namhyung Kim <namhyung@kernel.org>

I'm collecting this for v2 as well, ok? Holler if you disagree.

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/3] Metric related performance improvements
  2025-05-13 19:34   ` Arnaldo Carvalho de Melo
@ 2025-05-13 19:35     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-13 19:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Xu Yang, John Garry, Dominique Martinet,
	Thomas Richter, Weilin Wang, linux-perf-users, linux-kernel

On Tue, May 13, 2025 at 04:34:28PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Apr 09, 2025 at 11:49:16PM -0700, Namhyung Kim wrote:
> > Great, I also see the speedup on my machine from 32s to 3s.

> > Tested-by: Namhyung Kim <namhyung@kernel.org>
 
> I'm collecting this for v2 as well, ok? Holler if you disagree.

BTW, in my workstation:

Before:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real  0m9.286s
  user  0m9.354s
  sys   0m0.062s
  root@number:~#

After:

  root@number:~# time perf test "Parsing of PMU event table metrics"
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

  real  0m0.689s
  user  0m0.766s
  sys   0m0.042s
  root@number:~# time perf test 10
   10: PMU JSON event tests                                            :
   10.1: PMU event table sanity                                        : Ok
   10.2: PMU event map aliases                                         : Ok
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
   10.5: Parsing of metric thresholds with fake PMUs                   : Ok

  real  0m0.696s
  user  0m0.807s
  sys   0m0.064s
  root@number:~#

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-05-13 19:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10  4:45 [PATCH v1 0/3] Metric related performance improvements Ian Rogers
2025-04-10  4:45 ` [PATCH v1 1/3] perf pmu: Change aliases from list to hashmap Ian Rogers
2025-04-10  4:45 ` [PATCH v1 2/3] perf fncache: Switch to using hashmap Ian Rogers
2025-04-10  4:45 ` [PATCH v1 3/3] perf metricgroup: Binary search when resolving referred to metrics Ian Rogers
2025-04-10  6:49 ` [PATCH v1 0/3] Metric related performance improvements Namhyung Kim
2025-04-23 20:48   ` Ian Rogers
2025-05-12 16:40     ` Arnaldo Carvalho de Melo
2025-05-12 16:57       ` Ian Rogers
2025-05-12 17:29         ` Arnaldo Carvalho de Melo
2025-05-13 19:34   ` Arnaldo Carvalho de Melo
2025-05-13 19:35     ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).