public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
@ 2026-02-19 11:38 Thomas Richter
  2026-02-19 11:55 ` Jan Polensky
                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Thomas Richter @ 2026-02-19 11:38 UTC (permalink / raw)
  To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung
  Cc: agordeev, gor, sumanthk, hca, japo, Thomas Richter

Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")

removes symbols psw_idle() and psw_idle_exit() from the linux
kernel for s390. Remove them in perf tool's list of idle
functions. They can not be detected anymore.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
---
 tools/perf/util/symbol.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 814f960fa8f8..575951d98b1b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -752,8 +752,6 @@ static bool symbol__is_idle(const char *name)
 		"poll_idle",
 		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
 		NULL
 	};
 	int i;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
@ 2026-02-19 11:55 ` Jan Polensky
  2026-02-23 21:46 ` Namhyung Kim
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 0 replies; 60+ messages in thread
From: Jan Polensky @ 2026-02-19 11:55 UTC (permalink / raw)
  To: Thomas Richter, linux-kernel, linux-s390, linux-perf-users, acme,
	namhyung
  Cc: agordeev, gor, sumanthk, hca

On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
>
> removes symbols psw_idle() and psw_idle_exit() from the linux
> kernel for s390. Remove them in perf tool's list of idle
> functions. They can not be detected anymore.
>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
  2026-02-19 11:55 ` Jan Polensky
@ 2026-02-23 21:46 ` Namhyung Kim
  2026-02-23 23:14   ` Arnaldo Melo
  2026-03-02 18:43   ` Arnaldo Carvalho de Melo
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 2 replies; 60+ messages in thread
From: Namhyung Kim @ 2026-02-23 21:46 UTC (permalink / raw)
  To: Thomas Richter
  Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
	sumanthk, hca, japo

On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> 
> removes symbols psw_idle() and psw_idle_exit() from the linux
> kernel for s390. Remove them in perf tool's list of idle
> functions. They can not be detected anymore.

But I think old kernels may still run somewhere.  It seems the above
commit was merged to v6.10.  Maybe we should wait some more time before
removing it in the tool.

Thanks,
Namhyung

> 
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Suggested-by: Heiko Carstens <hca@linux.ibm.com>
> ---
>  tools/perf/util/symbol.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 814f960fa8f8..575951d98b1b 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -752,8 +752,6 @@ static bool symbol__is_idle(const char *name)
>  		"poll_idle",
>  		"ppc64_runlatch_off",
>  		"pseries_dedicated_idle_sleep",
> -		"psw_idle",
> -		"psw_idle_exit",
>  		NULL
>  	};
>  	int i;
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-23 21:46 ` Namhyung Kim
@ 2026-02-23 23:14   ` Arnaldo Melo
  2026-03-02 18:43   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 60+ messages in thread
From: Arnaldo Melo @ 2026-02-23 23:14 UTC (permalink / raw)
  To: Namhyung Kim, Thomas Richter
  Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
	sumanthk, hca, japo



On February 23, 2026 6:46:21 PM GMT-03:00, Namhyung Kim <namhyung@kernel.org> wrote:
>On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
>> Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
>> 
>> removes symbols psw_idle() and psw_idle_exit() from the linux
>> kernel for s390. Remove them in perf tool's list of idle
>> functions. They can not be detected anymore.
>
>But I think old kernels may still run somewhere.  It seems the above
>commit was merged to v6.10.  Maybe we should wait some more time before
>removing it in the tool.

Right, people keep asking if one can use a new version of perf on an old kernel and vice versa. 

So I think we should not apply this patch. 

There has been efforts in the past to try to have have some info per sample indicating the "context" for a sample, if it was in idle processing, hard/soft irq processing, etc, but that didn't come to fruition so far. 

With that we could get rid of this flaky heuristic of looking at a symbol name.

- Arnaldo


>
>Thanks,
>Namhyung
>
>> 
>> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
>> Suggested-by: Heiko Carstens <hca@linux.ibm.com>
>> ---
>>  tools/perf/util/symbol.c | 2 --
>>  1 file changed, 2 deletions(-)
>> 
>> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
>> index 814f960fa8f8..575951d98b1b 100644
>> --- a/tools/perf/util/symbol.c
>> +++ b/tools/perf/util/symbol.c
>> @@ -752,8 +752,6 @@ static bool symbol__is_idle(const char *name)
>>  		"poll_idle",
>>  		"ppc64_runlatch_off",
>>  		"pseries_dedicated_idle_sleep",
>> -		"psw_idle",
>> -		"psw_idle_exit",
>>  		NULL
>>  	};
>>  	int i;
>> -- 
>> 2.53.0
>> 

- Arnaldo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-23 21:46 ` Namhyung Kim
  2026-02-23 23:14   ` Arnaldo Melo
@ 2026-03-02 18:43   ` Arnaldo Carvalho de Melo
  2026-03-02 19:44     ` Ian Rogers
  1 sibling, 1 reply; 60+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-02 18:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Thomas Richter, linux-kernel, linux-s390, linux-perf-users,
	agordeev, gor, sumanthk, hca, japo

On Mon, Feb 23, 2026 at 01:46:21PM -0800, Namhyung Kim wrote:
> On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> > Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> > 
> > removes symbols psw_idle() and psw_idle_exit() from the linux
> > kernel for s390. Remove them in perf tool's list of idle
> > functions. They can not be detected anymore.
> 
> But I think old kernels may still run somewhere.  It seems the above
> commit was merged to v6.10.  Maybe we should wait some more time before
> removing it in the tool.

Agreed, using a new perf tool, say built from the tarballs made
available at:

https://www.kernel.org/pub/linux/kernel/tools/perf/v7.0.0/perf-7.0.0-rc1.tar.xz

(I will not make a rc2 available since there are no changes to the
tools/perf codebase in this rc).

On older kernels should still ignore those functions.

A suggestion for work in this area instead is to get those samples into
a special bucket, the "idle" one, and show it at some place in the
screen.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-03-02 18:43   ` Arnaldo Carvalho de Melo
@ 2026-03-02 19:44     ` Ian Rogers
  2026-03-04 14:34       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-02 19:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Thomas Richter, linux-kernel, linux-s390,
	linux-perf-users, agordeev, gor, sumanthk, hca, japo

On Mon, Mar 2, 2026 at 10:43 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Mon, Feb 23, 2026 at 01:46:21PM -0800, Namhyung Kim wrote:
> > On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> > > Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> > >
> > > removes symbols psw_idle() and psw_idle_exit() from the linux
> > > kernel for s390. Remove them in perf tool's list of idle
> > > functions. They can not be detected anymore.
> >
> > But I think old kernels may still run somewhere.  It seems the above
> > commit was merged to v6.10.  Maybe we should wait some more time before
> > removing it in the tool.
>
> Agreed, using a new perf tool, say built from the tarballs made
> available at:
>
> https://www.kernel.org/pub/linux/kernel/tools/perf/v7.0.0/perf-7.0.0-rc1.tar.xz
>
> (I will not make a rc2 available since there are no changes to the
> tools/perf codebase in this rc).
>
> On older kernels should still ignore those functions.
>
> A suggestion for work in this area instead is to get those samples into
> a special bucket, the "idle" one, and show it at some place in the
> screen.

Would it also be sensible to pass the perf_env:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h?h=perf-tools-next#n74
into symbol__is_idle? The contents of the perf_env are shown by `perf
report --header`:
```
# ========
# captured on    : Mon Mar  2 11:34:47 2026
# header version : 1
# data offset    : 904
# data size      : 4268216
# feat offset    : 4269120
# hostname : google.com
# os release : 6.17.13-1rodete1-amd64
# perf version : 7.0.rc1.g982b63f6380b
# arch : x86_64
# nrcpus online : 28
# nrcpus avail : 28
# cpudesc : Intel(R) Core(TM) i7-14700
# cpuid : GenuineIntel,6,183,1
...
# e_machine : 62
#   e_flags : 0
...
```
The kernel version is in the release and the e_machine/arch captures
the CPU type.

Thanks,
Ian

> Thanks,
>
> - Arnaldo
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
  2026-02-19 11:55 ` Jan Polensky
  2026-02-23 21:46 ` Namhyung Kim
@ 2026-03-02 23:43 ` Ian Rogers
  2026-03-24 17:14   ` Ian Rogers
  2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
  2 siblings, 2 replies; 60+ messages in thread
From: Ian Rogers @ 2026-03-02 23:43 UTC (permalink / raw)
  To: tmricht
  Cc: acme, agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, namhyung, sumanthk, Ian Rogers

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 106 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 85 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 710604c4f6f6..bc3c8e3b6ec0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -750,6 +750,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -829,7 +830,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 76912c62b6a0..6bb46384aa0c 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1725,7 +1725,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 8662001e1e25..6155f509ca70 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "debug.h"
 #include "event.h"
@@ -51,7 +53,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,88 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = env ? env->e_machine : EM_HOST;
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (e_machine == EM_S390) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+		sscanf(release, "%d.%d", &major, &minor);
 
-	return strlist__has_entry(idle_symbols_list, name);
+		/* Before v6.10, s390 used psw_idle. */
+		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 3fb5d146d9b1..508dd9f336e9 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -24,6 +24,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -56,8 +63,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle. */
+	enum symbol_idle_kind idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -269,5 +275,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-03-02 19:44     ` Ian Rogers
@ 2026-03-04 14:34       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 60+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-04 14:34 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Namhyung Kim, Thomas Richter, linux-kernel, linux-s390,
	linux-perf-users, agordeev, gor, sumanthk, hca, japo

On Mon, Mar 02, 2026 at 11:44:19AM -0800, Ian Rogers wrote:
> On Mon, Mar 2, 2026 at 10:43 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Mon, Feb 23, 2026 at 01:46:21PM -0800, Namhyung Kim wrote:
> > > On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> > > > Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> > > >
> > > > removes symbols psw_idle() and psw_idle_exit() from the linux
> > > > kernel for s390. Remove them in perf tool's list of idle
> > > > functions. They can not be detected anymore.
> > >
> > > But I think old kernels may still run somewhere.  It seems the above
> > > commit was merged to v6.10.  Maybe we should wait some more time before
> > > removing it in the tool.
> >
> > Agreed, using a new perf tool, say built from the tarballs made
> > available at:
> >
> > https://www.kernel.org/pub/linux/kernel/tools/perf/v7.0.0/perf-7.0.0-rc1.tar.xz
> >
> > (I will not make a rc2 available since there are no changes to the
> > tools/perf codebase in this rc).
> >
> > On older kernels should still ignore those functions.
> >
> > A suggestion for work in this area instead is to get those samples into
> > a special bucket, the "idle" one, and show it at some place in the
> > screen.
> 
> Would it also be sensible to pass the perf_env:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h?h=perf-tools-next#n74
> into symbol__is_idle? The contents of the perf_env are shown by `perf
> report --header`:
> ```
> # ========
> # captured on    : Mon Mar  2 11:34:47 2026
> # header version : 1
> # data offset    : 904
> # data size      : 4268216
> # feat offset    : 4269120
> # hostname : google.com
> # os release : 6.17.13-1rodete1-amd64
> # perf version : 7.0.rc1.g982b63f6380b
> # arch : x86_64
> # nrcpus online : 28
> # nrcpus avail : 28
> # cpudesc : Intel(R) Core(TM) i7-14700
> # cpuid : GenuineIntel,6,183,1
> ...
> # e_machine : 62
> #   e_flags : 0
> ...
> ```
> The kernel version is in the release and the e_machine/arch captures
> the CPU type.

Yeah, I think it is a good improvement, I think you mean that we should
have per-arch idle symbol lists? 

- Arnaldo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-03-24 17:14   ` Ian Rogers
  2026-03-25  6:58     ` Namhyung Kim
  2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
  1 sibling, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-24 17:14 UTC (permalink / raw)
  To: tmricht, namhyung, acme
  Cc: agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, sumanthk

On Mon, Mar 2, 2026 at 3:43 PM Ian Rogers <irogers@google.com> wrote:
>
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
>
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
>
> Signed-off-by: Ian Rogers <irogers@google.com>

Ping.

Thanks,
Ian

> ---
>  tools/perf/builtin-top.c     |   6 +-
>  tools/perf/util/symbol-elf.c |   2 +-
>  tools/perf/util/symbol.c     | 106 ++++++++++++++++++++++-------------
>  tools/perf/util/symbol.h     |  15 +++--
>  4 files changed, 85 insertions(+), 44 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 710604c4f6f6..bc3c8e3b6ec0 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -750,6 +750,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  {
>         struct perf_top *top = container_of(tool, struct perf_top, tool);
>         struct addr_location al;
> +       struct dso *dso = NULL;
>
>         if (!machine && perf_guest) {
>                 static struct intlist *seen;
> @@ -829,7 +830,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>                 }
>         }
>
> -       if (al.sym == NULL || !al.sym->idle) {
> +       if (al.map)
> +               dso = map__dso(al.map);
> +
> +       if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
>                 struct hists *hists = evsel__hists(evsel);
>                 struct hist_entry_iter iter = {
>                         .evsel          = evsel,
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 76912c62b6a0..6bb46384aa0c 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1725,7 +1725,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
>
>                 arch__sym_update(f, &sym);
>
> -               __symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> +               __symbols__insert(dso__symbols(curr_dso), f);
>                 nr++;
>         }
>         dso__put(curr_dso);
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 8662001e1e25..6155f509ca70 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -25,6 +25,8 @@
>  #include "demangle-ocaml.h"
>  #include "demangle-rust-v0.h"
>  #include "dso.h"
> +#include "dwarf-regs.h"
> +#include "env.h"
>  #include "util.h" // lsdir()
>  #include "debug.h"
>  #include "event.h"
> @@ -51,7 +53,6 @@
>
>  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> -static bool symbol__is_idle(const char *name);
>
>  int vmlinux_path__nr_entries;
>  char **vmlinux_path;
> @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
>         }
>  }
>
> -void __symbols__insert(struct rb_root_cached *symbols,
> -                      struct symbol *sym, bool kernel)
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
>         struct rb_node **p = &symbols->rb_root.rb_node;
>         struct rb_node *parent = NULL;
> @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>         struct symbol *s;
>         bool leftmost = true;
>
> -       if (kernel) {
> -               const char *name = sym->name;
> -               /*
> -                * ppc64 uses function descriptors and appends a '.' to the
> -                * start of every instruction address. Remove it.
> -                */
> -               if (name[0] == '.')
> -                       name++;
> -               sym->idle = symbol__is_idle(name);
> -       }
> -
>         while (*p != NULL) {
>                 parent = *p;
>                 s = rb_entry(parent, struct symbol, rb_node);
> @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
>
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
> -       __symbols__insert(symbols, sym, false);
> +       __symbols__insert(symbols, sym);
>  }
>
>  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
>
>  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
>  {
> -       __symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> +       __symbols__insert(dso__symbols(dso), sym);
>
>         /* update the symbol cache if necessary */
>         if (dso__last_find_result_addr(dso) >= sym->start &&
> @@ -716,47 +705,88 @@ int modules__parse(const char *filename, void *arg,
>         return err;
>  }
>
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +       const char *name = a;
> +       const char *const *sym = b;
> +
> +       return strcmp(name, *sym);
> +}
> +
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__is_idle(const char *name)
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
>  {
> -       const char * const idle_symbols[] = {
> +       static const char * const idle_symbols[] = {
>                 "acpi_idle_do_entry",
>                 "acpi_processor_ffh_cstate_enter",
>                 "arch_cpu_idle",
>                 "cpu_idle",
>                 "cpu_startup_entry",
> -               "idle_cpu",
> -               "intel_idle",
> -               "intel_idle_ibrs",
>                 "default_idle",
> -               "native_safe_halt",
>                 "enter_idle",
>                 "exit_idle",
> -               "mwait_idle",
> -               "mwait_idle_with_hints",
> -               "mwait_idle_with_hints.constprop.0",
> +               "idle_cpu",
> +               "native_safe_halt",
>                 "poll_idle",
> -               "ppc64_runlatch_off",
>                 "pseries_dedicated_idle_sleep",
> -               "psw_idle",
> -               "psw_idle_exit",
> -               NULL
>         };
> -       int i;
> -       static struct strlist *idle_symbols_list;
> +       const char *name = sym->name;
> +       uint16_t e_machine = env ? env->e_machine : EM_HOST;
> +
> +       if (sym->idle)
> +               return sym->idle == SYMBOL_IDLE__IDLE;
> +
> +       if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +               sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +               return false;
> +       }
>
> -       if (idle_symbols_list)
> -               return strlist__has_entry(idle_symbols_list, name);
> +       /*
> +        * ppc64 uses function descriptors and appends a '.' to the
> +        * start of every instruction address. Remove it.
> +        */
> +       if (name[0] == '.')
> +               name++;
> +
> +
> +       if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +                   sizeof(idle_symbols[0]), sym_name_cmp)) {
> +               sym->idle = SYMBOL_IDLE__IDLE;
> +               return true;
> +       }
> +
> +       if (e_machine == EM_386 || e_machine == EM_X86_64) {
> +               if (strstarts(name, "mwait_idle") ||
> +                   strstarts(name, "intel_idle")) {
> +                       sym->idle = SYMBOL_IDLE__IDLE;
> +                       return true;
> +               }
> +       }
> +
> +       if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
> +               sym->idle = SYMBOL_IDLE__IDLE;
> +               return true;
> +       }
>
> -       idle_symbols_list = strlist__new(NULL, NULL);
> +       if (e_machine == EM_S390) {
> +               int major = 0, minor = 0;
> +               const char *release = env && env->os_release
> +                       ? env->os_release : perf_version_string;
>
> -       for (i = 0; idle_symbols[i]; i++)
> -               strlist__add(idle_symbols_list, idle_symbols[i]);
> +               sscanf(release, "%d.%d", &major, &minor);
>
> -       return strlist__has_entry(idle_symbols_list, name);
> +               /* Before v6.10, s390 used psw_idle. */
> +               if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> +                       sym->idle = SYMBOL_IDLE__IDLE;
> +                       return true;
> +               }
> +       }
> +
> +       sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +       return false;
>  }
>
>  static int map__process_kallsym_symbol(void *arg, const char *name,
> @@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
>          * We will pass the symbols to the filter later, in
>          * map__split_kallsyms, when we have split the maps per module
>          */
> -       __symbols__insert(root, sym, !strchr(name, '['));
> +       __symbols__insert(root, sym);
>
>         return 0;
>  }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index 3fb5d146d9b1..508dd9f336e9 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -24,6 +24,7 @@ struct dso;
>  struct map;
>  struct maps;
>  struct option;
> +struct perf_env;
>  struct build_id;
>
>  /*
> @@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>                              GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>
> +enum symbol_idle_kind {
> +       SYMBOL_IDLE__UNKNOWN = 0,
> +       SYMBOL_IDLE__NOT_IDLE = 1,
> +       SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -56,8 +63,8 @@ struct symbol {
>         u8              type:4;
>         /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>         u8              binding:4;
> -       /** Set true for kernel symbols of idle routines. */
> -       u8              idle:1;
> +       /** Cache for symbol__is_idle. */
> +       enum symbol_idle_kind idle:2;
>         /** Resolvable but tools ignore it (e.g. idle routines). */
>         u8              ignore:1;
>         /** Symbol for an inlined function. */
> @@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
>
>  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>
> -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> -                      bool kernel);
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
>  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> @@ -269,5 +275,6 @@ enum {
>  };
>
>  int symbol__validate_sym_arguments(void);
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
>
>  #endif /* __PERF_SYMBOL */
> --
> 2.53.0.473.g4a7958ca14-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-03-24 17:14   ` Ian Rogers
@ 2026-03-25  6:58     ` Namhyung Kim
  2026-03-25 15:58       ` Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Namhyung Kim @ 2026-03-25  6:58 UTC (permalink / raw)
  To: Ian Rogers
  Cc: tmricht, acme, agordeev, gor, hca, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Hi Ian,

Sorry for the delay.

On Tue, Mar 24, 2026 at 10:14:01AM -0700, Ian Rogers wrote:
> On Mon, Mar 2, 2026 at 3:43 PM Ian Rogers <irogers@google.com> wrote:
[SNIP]
> > -       if (idle_symbols_list)
> > -               return strlist__has_entry(idle_symbols_list, name);
> > +       /*
> > +        * ppc64 uses function descriptors and appends a '.' to the
> > +        * start of every instruction address. Remove it.
> > +        */
> > +       if (name[0] == '.')

Then e_machine == EM_PPC64 can be checked here.

> > +               name++;
> > +
> > +

Two blank lines.

> > +       if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> > +                   sizeof(idle_symbols[0]), sym_name_cmp)) {
> > +               sym->idle = SYMBOL_IDLE__IDLE;
> > +               return true;
> > +       }
> > +
> > +       if (e_machine == EM_386 || e_machine == EM_X86_64) {
> > +               if (strstarts(name, "mwait_idle") ||
> > +                   strstarts(name, "intel_idle")) {
> > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > +                       return true;
> > +               }
> > +       }
> > +
> > +       if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
> > +               sym->idle = SYMBOL_IDLE__IDLE;
> > +               return true;
> > +       }
> >
> > -       idle_symbols_list = strlist__new(NULL, NULL);
> > +       if (e_machine == EM_S390) {
> > +               int major = 0, minor = 0;
> > +               const char *release = env && env->os_release
> > +                       ? env->os_release : perf_version_string;
> >
> > -       for (i = 0; idle_symbols[i]; i++)
> > -               strlist__add(idle_symbols_list, idle_symbols[i]);
> > +               sscanf(release, "%d.%d", &major, &minor);
> >
> > -       return strlist__has_entry(idle_symbols_list, name);
> > +               /* Before v6.10, s390 used psw_idle. */
> > +               if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > +                       return true;
> > +               }
> > +       }
> > +
> > +       sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +       return false;
> >  }
> >
> >  static int map__process_kallsym_symbol(void *arg, const char *name,
> > @@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
> >          * We will pass the symbols to the filter later, in
> >          * map__split_kallsyms, when we have split the maps per module
> >          */
> > -       __symbols__insert(root, sym, !strchr(name, '['));
> > +       __symbols__insert(root, sym);
> >
> >         return 0;
> >  }
> > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > index 3fb5d146d9b1..508dd9f336e9 100644
> > --- a/tools/perf/util/symbol.h
> > +++ b/tools/perf/util/symbol.h
> > @@ -24,6 +24,7 @@ struct dso;
> >  struct map;
> >  struct maps;
> >  struct option;
> > +struct perf_env;
> >  struct build_id;
> >
> >  /*
> > @@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
> >                              GElf_Shdr *shp, const char *name, size_t *idx);
> >  #endif
> >
> > +enum symbol_idle_kind {
> > +       SYMBOL_IDLE__UNKNOWN = 0,
> > +       SYMBOL_IDLE__NOT_IDLE = 1,
> > +       SYMBOL_IDLE__IDLE = 2,
> > +};
> > +
> >  /**
> >   * A symtab entry. When allocated this may be preceded by an annotation (see
> >   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> > @@ -56,8 +63,8 @@ struct symbol {
> >         u8              type:4;
> >         /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
> >         u8              binding:4;
> > -       /** Set true for kernel symbols of idle routines. */
> > -       u8              idle:1;
> > +       /** Cache for symbol__is_idle. */
> > +       enum symbol_idle_kind idle:2;

I'm curious if bitfields with different types (u8 and enum) can be
placed consecutively bitwise.  There can be a lot of symbols so it
could be a concern.

Thanks,
Namhyung


> >         /** Resolvable but tools ignore it (e.g. idle routines). */
> >         u8              ignore:1;
> >         /** Symbol for an inlined function. */
> > @@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
> >
> >  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
> >
> > -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> > -                      bool kernel);
> > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
> >  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> > @@ -269,5 +275,6 @@ enum {
> >  };
> >
> >  int symbol__validate_sym_arguments(void);
> > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
> >
> >  #endif /* __PERF_SYMBOL */
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-03-25  6:58     ` Namhyung Kim
@ 2026-03-25 15:58       ` Ian Rogers
  0 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-03-25 15:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: tmricht, acme, agordeev, gor, hca, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

On Tue, Mar 24, 2026 at 11:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> Sorry for the delay.
>
> On Tue, Mar 24, 2026 at 10:14:01AM -0700, Ian Rogers wrote:
> > On Mon, Mar 2, 2026 at 3:43 PM Ian Rogers <irogers@google.com> wrote:
> [SNIP]
> > > -       if (idle_symbols_list)
> > > -               return strlist__has_entry(idle_symbols_list, name);
> > > +       /*
> > > +        * ppc64 uses function descriptors and appends a '.' to the
> > > +        * start of every instruction address. Remove it.
> > > +        */
> > > +       if (name[0] == '.')
>
> Then e_machine == EM_PPC64 can be checked here.

Agreed, but potentially this is load bearing for more than just PPC so
I'd rather leave it as it is.

> > > +               name++;
> > > +
> > > +
>
> Two blank lines.

Will fix in v2.

> > > +       if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> > > +                   sizeof(idle_symbols[0]), sym_name_cmp)) {
> > > +               sym->idle = SYMBOL_IDLE__IDLE;
> > > +               return true;
> > > +       }
> > > +
> > > +       if (e_machine == EM_386 || e_machine == EM_X86_64) {
> > > +               if (strstarts(name, "mwait_idle") ||
> > > +                   strstarts(name, "intel_idle")) {
> > > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > > +                       return true;
> > > +               }
> > > +       }
> > > +
> > > +       if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
> > > +               sym->idle = SYMBOL_IDLE__IDLE;
> > > +               return true;
> > > +       }
> > >
> > > -       idle_symbols_list = strlist__new(NULL, NULL);
> > > +       if (e_machine == EM_S390) {
> > > +               int major = 0, minor = 0;
> > > +               const char *release = env && env->os_release
> > > +                       ? env->os_release : perf_version_string;
> > >
> > > -       for (i = 0; idle_symbols[i]; i++)
> > > -               strlist__add(idle_symbols_list, idle_symbols[i]);
> > > +               sscanf(release, "%d.%d", &major, &minor);
> > >
> > > -       return strlist__has_entry(idle_symbols_list, name);
> > > +               /* Before v6.10, s390 used psw_idle. */
> > > +               if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> > > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > > +                       return true;
> > > +               }
> > > +       }
> > > +
> > > +       sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > > +       return false;
> > >  }
> > >
> > >  static int map__process_kallsym_symbol(void *arg, const char *name,
> > > @@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
> > >          * We will pass the symbols to the filter later, in
> > >          * map__split_kallsyms, when we have split the maps per module
> > >          */
> > > -       __symbols__insert(root, sym, !strchr(name, '['));
> > > +       __symbols__insert(root, sym);
> > >
> > >         return 0;
> > >  }
> > > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > > index 3fb5d146d9b1..508dd9f336e9 100644
> > > --- a/tools/perf/util/symbol.h
> > > +++ b/tools/perf/util/symbol.h
> > > @@ -24,6 +24,7 @@ struct dso;
> > >  struct map;
> > >  struct maps;
> > >  struct option;
> > > +struct perf_env;
> > >  struct build_id;
> > >
> > >  /*
> > > @@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
> > >                              GElf_Shdr *shp, const char *name, size_t *idx);
> > >  #endif
> > >
> > > +enum symbol_idle_kind {
> > > +       SYMBOL_IDLE__UNKNOWN = 0,
> > > +       SYMBOL_IDLE__NOT_IDLE = 1,
> > > +       SYMBOL_IDLE__IDLE = 2,
> > > +};
> > > +
> > >  /**
> > >   * A symtab entry. When allocated this may be preceded by an annotation (see
> > >   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> > > @@ -56,8 +63,8 @@ struct symbol {
> > >         u8              type:4;
> > >         /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
> > >         u8              binding:4;
> > > -       /** Set true for kernel symbols of idle routines. */
> > > -       u8              idle:1;
> > > +       /** Cache for symbol__is_idle. */
> > > +       enum symbol_idle_kind idle:2;
>
> I'm curious if bitfields with different types (u8 and enum) can be
> placed consecutively bitwise.  There can be a lot of symbols so it
> could be a concern.

pahole says no size difference:

Before:
```
struct symbol {
       struct rb_node             rb_node
__attribute__((__aligned__(8))); /*     0    24 */
       u64                        start;                /*    24     8 */
       u64                        end;                  /*    32     8 */
       u16                        namelen;              /*    40     2 */
       u8                         type:4;               /*    42: 0  1 */
       u8                         binding:4;            /*    42: 4  1 */
       u8                         idle:1;               /*    43: 0  1 */
       u8                         ignore:1;             /*    43: 1  1 */
       u8                         inlined:1;            /*    43: 2  1 */
       u8                         annotate2:1;          /*    43: 3  1 */
       u8                         ifunc_alias:1;        /*    43: 4  1 */

       /* XXX 3 bits hole, try to pack */

       u8                         arch_sym;             /*    44     1 */
       char                       name[];               /*    45     0 */

       /* size: 48, cachelines: 1, members: 13 */
       /* sum members: 43 */
       /* sum bitfield members: 13 bits, bit holes: 1, sum bit holes: 3 bits */
       /* padding: 3 */
       /* forced alignments: 1 */
       /* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
```

After:
```
struct symbol {
       struct rb_node             rb_node
__attribute__((__aligned__(8))); /*     0    24 */
       u64                        start;                /*    24     8 */
       u64                        end;                  /*    32     8 */
       u16                        namelen;              /*    40     2 */
       u8                         type:4;               /*    42: 0  1 */
       u8                         binding:4;            /*    42: 4  1 */

       /* Bitfield combined with previous fields */

       enum symbol_idle_kind      idle:2;               /*    40:24  4 */

       /* Bitfield combined with next fields */

       u8                         ignore:1;             /*    43: 2  1 */
       u8                         inlined:1;            /*    43: 3  1 */
       u8                         annotate2:1;          /*    43: 4  1 */
       u8                         ifunc_alias:1;        /*    43: 5  1 */

       /* XXX 2 bits hole, try to pack */

       u8                         arch_sym;             /*    44     1 */
       char                       name[];               /*    45     0 */

       /* size: 48, cachelines: 1, members: 13 */
       /* sum members: 43 */
       /* sum bitfield members: 14 bits, bit holes: 1, sum bit holes: 2 bits */
       /* padding: 3 */
       /* forced alignments: 1 */
       /* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
```

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > >         /** Resolvable but tools ignore it (e.g. idle routines). */
> > >         u8              ignore:1;
> > >         /** Symbol for an inlined function. */
> > > @@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
> > >
> > >  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
> > >
> > > -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> > > -                      bool kernel);
> > > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> > >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> > >  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
> > >  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> > > @@ -269,5 +275,6 @@ enum {
> > >  };
> > >
> > >  int symbol__validate_sym_arguments(void);
> > > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
> > >
> > >  #endif /* __PERF_SYMBOL */
> > > --
> > > 2.53.0.473.g4a7958ca14-goog
> > >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2026-03-24 17:14   ` Ian Rogers
@ 2026-03-25 16:18   ` Ian Rogers
  2026-03-26  7:20     ` Honglei Wang
  1 sibling, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-25 16:18 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, sumanthk, jameshongleiwang

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 84 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3cd4e5a03cc5..9fabf5146d89 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ce9195717f44..1a357af93a0a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = env ? env->e_machine : EM_HOST;
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
+
+		sscanf(release, "%d.%d", &major, &minor);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..f26f67bd7982 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle. */
+	enum symbol_idle_kind idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
@ 2026-03-26  7:20     ` Honglei Wang
  2026-03-26 15:11       ` Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Honglei Wang @ 2026-03-26  7:20 UTC (permalink / raw)
  To: Ian Rogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, sumanthk

Hi Ian,

On 3/26/26 12:18 AM, Ian Rogers wrote:
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
> 
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> 
> The change switches x86 matches to use strstarts which means
> intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/
> ---
>  tools/perf/builtin-top.c     |   6 +-
>  tools/perf/util/symbol-elf.c |   2 +-
>  tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
>  tools/perf/util/symbol.h     |  15 +++--
>  4 files changed, 84 insertions(+), 44 deletions(-)
> 
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 37950efb28ac..bdc1c761cd61 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  {
>  	struct perf_top *top = container_of(tool, struct perf_top, tool);
>  	struct addr_location al;
> +	struct dso *dso = NULL;
>  
>  	if (!machine && perf_guest) {
>  		static struct intlist *seen;
> @@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  		}
>  	}
>  
> -	if (al.sym == NULL || !al.sym->idle) {
> +	if (al.map)
> +		dso = map__dso(al.map);
> +
> +	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
>  		struct hists *hists = evsel__hists(evsel);
>  		struct hist_entry_iter iter = {
>  			.evsel		= evsel,
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 3cd4e5a03cc5..9fabf5146d89 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
>  
>  		arch__sym_update(f, &sym);
>  
> -		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> +		__symbols__insert(dso__symbols(curr_dso), f);
>  		nr++;
>  	}
>  	dso__put(curr_dso);
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index ce9195717f44..1a357af93a0a 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -25,6 +25,8 @@
>  #include "demangle-ocaml.h"
>  #include "demangle-rust-v0.h"
>  #include "dso.h"
> +#include "dwarf-regs.h"
> +#include "env.h"
>  #include "util.h" // lsdir()
>  #include "event.h"
>  #include "machine.h"
> @@ -50,7 +52,6 @@
>  
>  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> -static bool symbol__is_idle(const char *name);
>  
>  int vmlinux_path__nr_entries;
>  char **vmlinux_path;
> @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
>  	}
>  }
>  
> -void __symbols__insert(struct rb_root_cached *symbols,
> -		       struct symbol *sym, bool kernel)
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
>  	struct rb_node **p = &symbols->rb_root.rb_node;
>  	struct rb_node *parent = NULL;
> @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  	struct symbol *s;
>  	bool leftmost = true;
>  
> -	if (kernel) {
> -		const char *name = sym->name;
> -		/*
> -		 * ppc64 uses function descriptors and appends a '.' to the
> -		 * start of every instruction address. Remove it.
> -		 */
> -		if (name[0] == '.')
> -			name++;
> -		sym->idle = symbol__is_idle(name);
> -	}
> -
>  	while (*p != NULL) {
>  		parent = *p;
>  		s = rb_entry(parent, struct symbol, rb_node);
> @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
> -	__symbols__insert(symbols, sym, false);
> +	__symbols__insert(symbols, sym);
>  }
>  
>  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
>  
>  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
>  {
> -	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> +	__symbols__insert(dso__symbols(dso), sym);
>  
>  	/* update the symbol cache if necessary */
>  	if (dso__last_find_result_addr(dso) >= sym->start &&
> @@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
>  	return err;
>  }
>  
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +	const char *name = a;
> +	const char *const *sym = b;
> +
> +	return strcmp(name, *sym);
> +}
> +
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__is_idle(const char *name)
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
>  {
> -	const char * const idle_symbols[] = {
> +	static const char * const idle_symbols[] = {
>  		"acpi_idle_do_entry",
>  		"acpi_processor_ffh_cstate_enter",
>  		"arch_cpu_idle",
>  		"cpu_idle",
>  		"cpu_startup_entry",
> -		"idle_cpu",
> -		"intel_idle",
> -		"intel_idle_ibrs",
>  		"default_idle",
> -		"native_safe_halt",
>  		"enter_idle",
>  		"exit_idle",
> -		"mwait_idle",
> -		"mwait_idle_with_hints",
> -		"mwait_idle_with_hints.constprop.0",
> +		"idle_cpu",
> +		"native_safe_halt",
>  		"poll_idle",
> -		"ppc64_runlatch_off",
>  		"pseries_dedicated_idle_sleep",
> -		"psw_idle",
> -		"psw_idle_exit",
> -		NULL
>  	};
> -	int i;
> -	static struct strlist *idle_symbols_list;
> +	const char *name = sym->name;
> +	uint16_t e_machine = env ? env->e_machine : EM_HOST;
>  
> -	if (idle_symbols_list)
> -		return strlist__has_entry(idle_symbols_list, name);
> +	if (sym->idle)
> +		return sym->idle == SYMBOL_IDLE__IDLE;
>  
> -	idle_symbols_list = strlist__new(NULL, NULL);
> +	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +		sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +		return false;
> +	}
>  
> -	for (i = 0; idle_symbols[i]; i++)
> -		strlist__add(idle_symbols_list, idle_symbols[i]);
> +	/*
> +	 * ppc64 uses function descriptors and appends a '.' to the
> +	 * start of every instruction address. Remove it.
> +	 */
> +	if (name[0] == '.')
> +		name++;
>  
> -	return strlist__has_entry(idle_symbols_list, name);
> +	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +		    sizeof(idle_symbols[0]), sym_name_cmp)) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_386 || e_machine == EM_X86_64) {

As said in anther thread, intel_idle_irq was still there on my test
machine. I did a bit debug and found e_machine == 0 so it couldn't run
into this branch. After dig more, it should be
deliver_event()->perf_session__find_machine() return a struct machine
whose env->e_machine is 0. I'm still busy today to do more, wish this
clue can help.

Thanks,
Honglei

> +		if (strstarts(name, "mwait_idle") ||
> +		    strstarts(name, "intel_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_S390) {
> +		int major = 0, minor = 0;
> +		const char *release = env && env->os_release
> +			? env->os_release : perf_version_string;
> +
> +		sscanf(release, "%d.%d", &major, &minor);
> +
> +		/* Before v6.10, s390 used psw_idle. */
> +		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +	return false;
>  }
>  
>  static int map__process_kallsym_symbol(void *arg, const char *name,
> @@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
>  	 * We will pass the symbols to the filter later, in
>  	 * map__split_kallsyms, when we have split the maps per module
>  	 */
> -	__symbols__insert(root, sym, !strchr(name, '['));
> +	__symbols__insert(root, sym);
>  
>  	return 0;
>  }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index c67814d6d6d6..f26f67bd7982 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -25,6 +25,7 @@ struct dso;
>  struct map;
>  struct maps;
>  struct option;
> +struct perf_env;
>  struct build_id;
>  
>  /*
> @@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>  			     GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>  
> +enum symbol_idle_kind {
> +	SYMBOL_IDLE__UNKNOWN = 0,
> +	SYMBOL_IDLE__NOT_IDLE = 1,
> +	SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -57,8 +64,8 @@ struct symbol {
>  	u8		type:4;
>  	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>  	u8		binding:4;
> -	/** Set true for kernel symbols of idle routines. */
> -	u8		idle:1;
> +	/** Cache for symbol__is_idle. */
> +	enum symbol_idle_kind idle:2;
>  	/** Resolvable but tools ignore it (e.g. idle routines). */
>  	u8		ignore:1;
>  	/** Symbol for an inlined function. */
> @@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
>  
>  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>  
> -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> -		       bool kernel);
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
>  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> @@ -286,5 +292,6 @@ enum {
>  };
>  
>  int symbol__validate_sym_arguments(void);
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
>  
>  #endif /* __PERF_SYMBOL */


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-26  7:20     ` Honglei Wang
@ 2026-03-26 15:11       ` Ian Rogers
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-26 15:11 UTC (permalink / raw)
  To: Honglei Wang
  Cc: acme, namhyung, tmricht, agordeev, gor, hca, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

On Thu, Mar 26, 2026 at 12:20 AM Honglei Wang <jameshongleiwang@126.com> wrote:
>
> Hi Ian,
>
> On 3/26/26 12:18 AM, Ian Rogers wrote:
> > Move the idle boolean to a helper symbol__is_idle function. In the
> > function lazily compute whether a symbol is an idle function taking
> > into consideration the kernel version and architecture of the
> > machine. As symbols__insert no longer needs to know if a symbol is for
> > the kernel, remove the argument.
> >
> > This change is inspired by mailing list discussion, particularly from
> > Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> > <hca@linux.ibm.com>:
> > https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> >
> > The change switches x86 matches to use strstarts which means
> > intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> > change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> > https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/
> > ---
> >  tools/perf/builtin-top.c     |   6 +-
> >  tools/perf/util/symbol-elf.c |   2 +-
> >  tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
> >  tools/perf/util/symbol.h     |  15 +++--
> >  4 files changed, 84 insertions(+), 44 deletions(-)
> >
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 37950efb28ac..bdc1c761cd61 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
> >  {
> >       struct perf_top *top = container_of(tool, struct perf_top, tool);
> >       struct addr_location al;
> > +     struct dso *dso = NULL;
> >
> >       if (!machine && perf_guest) {
> >               static struct intlist *seen;
> > @@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
> >               }
> >       }
> >
> > -     if (al.sym == NULL || !al.sym->idle) {
> > +     if (al.map)
> > +             dso = map__dso(al.map);
> > +
> > +     if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
> >               struct hists *hists = evsel__hists(evsel);
> >               struct hist_entry_iter iter = {
> >                       .evsel          = evsel,
> > diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> > index 3cd4e5a03cc5..9fabf5146d89 100644
> > --- a/tools/perf/util/symbol-elf.c
> > +++ b/tools/perf/util/symbol-elf.c
> > @@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
> >
> >               arch__sym_update(f, &sym);
> >
> > -             __symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> > +             __symbols__insert(dso__symbols(curr_dso), f);
> >               nr++;
> >       }
> >       dso__put(curr_dso);
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index ce9195717f44..1a357af93a0a 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -25,6 +25,8 @@
> >  #include "demangle-ocaml.h"
> >  #include "demangle-rust-v0.h"
> >  #include "dso.h"
> > +#include "dwarf-regs.h"
> > +#include "env.h"
> >  #include "util.h" // lsdir()
> >  #include "event.h"
> >  #include "machine.h"
> > @@ -50,7 +52,6 @@
> >
> >  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
> >  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> > -static bool symbol__is_idle(const char *name);
> >
> >  int vmlinux_path__nr_entries;
> >  char **vmlinux_path;
> > @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
> >       }
> >  }
> >
> > -void __symbols__insert(struct rb_root_cached *symbols,
> > -                    struct symbol *sym, bool kernel)
> > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
> >  {
> >       struct rb_node **p = &symbols->rb_root.rb_node;
> >       struct rb_node *parent = NULL;
> > @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
> >       struct symbol *s;
> >       bool leftmost = true;
> >
> > -     if (kernel) {
> > -             const char *name = sym->name;
> > -             /*
> > -              * ppc64 uses function descriptors and appends a '.' to the
> > -              * start of every instruction address. Remove it.
> > -              */
> > -             if (name[0] == '.')
> > -                     name++;
> > -             sym->idle = symbol__is_idle(name);
> > -     }
> > -
> >       while (*p != NULL) {
> >               parent = *p;
> >               s = rb_entry(parent, struct symbol, rb_node);
> > @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
> >
> >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
> >  {
> > -     __symbols__insert(symbols, sym, false);
> > +     __symbols__insert(symbols, sym);
> >  }
> >
> >  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> > @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
> >
> >  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
> >  {
> > -     __symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> > +     __symbols__insert(dso__symbols(dso), sym);
> >
> >       /* update the symbol cache if necessary */
> >       if (dso__last_find_result_addr(dso) >= sym->start &&
> > @@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
> >       return err;
> >  }
> >
> > +static int sym_name_cmp(const void *a, const void *b)
> > +{
> > +     const char *name = a;
> > +     const char *const *sym = b;
> > +
> > +     return strcmp(name, *sym);
> > +}
> > +
> >  /*
> >   * These are symbols in the kernel image, so make sure that
> >   * sym is from a kernel DSO.
> >   */
> > -static bool symbol__is_idle(const char *name)
> > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
> >  {
> > -     const char * const idle_symbols[] = {
> > +     static const char * const idle_symbols[] = {
> >               "acpi_idle_do_entry",
> >               "acpi_processor_ffh_cstate_enter",
> >               "arch_cpu_idle",
> >               "cpu_idle",
> >               "cpu_startup_entry",
> > -             "idle_cpu",
> > -             "intel_idle",
> > -             "intel_idle_ibrs",
> >               "default_idle",
> > -             "native_safe_halt",
> >               "enter_idle",
> >               "exit_idle",
> > -             "mwait_idle",
> > -             "mwait_idle_with_hints",
> > -             "mwait_idle_with_hints.constprop.0",
> > +             "idle_cpu",
> > +             "native_safe_halt",
> >               "poll_idle",
> > -             "ppc64_runlatch_off",
> >               "pseries_dedicated_idle_sleep",
> > -             "psw_idle",
> > -             "psw_idle_exit",
> > -             NULL
> >       };
> > -     int i;
> > -     static struct strlist *idle_symbols_list;
> > +     const char *name = sym->name;
> > +     uint16_t e_machine = env ? env->e_machine : EM_HOST;
> >
> > -     if (idle_symbols_list)
> > -             return strlist__has_entry(idle_symbols_list, name);
> > +     if (sym->idle)
> > +             return sym->idle == SYMBOL_IDLE__IDLE;
> >
> > -     idle_symbols_list = strlist__new(NULL, NULL);
> > +     if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> > +             sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +             return false;
> > +     }
> >
> > -     for (i = 0; idle_symbols[i]; i++)
> > -             strlist__add(idle_symbols_list, idle_symbols[i]);
> > +     /*
> > +      * ppc64 uses function descriptors and appends a '.' to the
> > +      * start of every instruction address. Remove it.
> > +      */
> > +     if (name[0] == '.')
> > +             name++;
> >
> > -     return strlist__has_entry(idle_symbols_list, name);
> > +     if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> > +                 sizeof(idle_symbols[0]), sym_name_cmp)) {
> > +             sym->idle = SYMBOL_IDLE__IDLE;
> > +             return true;
> > +     }
> > +
> > +     if (e_machine == EM_386 || e_machine == EM_X86_64) {
>
> As said in anther thread, intel_idle_irq was still there on my test
> machine. I did a bit debug and found e_machine == 0 so it couldn't run
> into this branch. After dig more, it should be
> deliver_event()->perf_session__find_machine() return a struct machine
> whose env->e_machine is 0. I'm still busy today to do more, wish this
> clue can help.

I can see this, the env's e_machine isn't being lazily initialized for
the host like the arch is. I'll add a patch for this.

Thanks,
Ian

> Thanks,
> Honglei
>
> > +             if (strstarts(name, "mwait_idle") ||
> > +                 strstarts(name, "intel_idle")) {
> > +                     sym->idle = SYMBOL_IDLE__IDLE;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
> > +             sym->idle = SYMBOL_IDLE__IDLE;
> > +             return true;
> > +     }
> > +
> > +     if (e_machine == EM_S390) {
> > +             int major = 0, minor = 0;
> > +             const char *release = env && env->os_release
> > +                     ? env->os_release : perf_version_string;
> > +
> > +             sscanf(release, "%d.%d", &major, &minor);
> > +
> > +             /* Before v6.10, s390 used psw_idle. */
> > +             if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> > +                     sym->idle = SYMBOL_IDLE__IDLE;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +     return false;
> >  }
> >
> >  static int map__process_kallsym_symbol(void *arg, const char *name,
> > @@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
> >        * We will pass the symbols to the filter later, in
> >        * map__split_kallsyms, when we have split the maps per module
> >        */
> > -     __symbols__insert(root, sym, !strchr(name, '['));
> > +     __symbols__insert(root, sym);
> >
> >       return 0;
> >  }
> > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > index c67814d6d6d6..f26f67bd7982 100644
> > --- a/tools/perf/util/symbol.h
> > +++ b/tools/perf/util/symbol.h
> > @@ -25,6 +25,7 @@ struct dso;
> >  struct map;
> >  struct maps;
> >  struct option;
> > +struct perf_env;
> >  struct build_id;
> >
> >  /*
> > @@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
> >                            GElf_Shdr *shp, const char *name, size_t *idx);
> >  #endif
> >
> > +enum symbol_idle_kind {
> > +     SYMBOL_IDLE__UNKNOWN = 0,
> > +     SYMBOL_IDLE__NOT_IDLE = 1,
> > +     SYMBOL_IDLE__IDLE = 2,
> > +};
> > +
> >  /**
> >   * A symtab entry. When allocated this may be preceded by an annotation (see
> >   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> > @@ -57,8 +64,8 @@ struct symbol {
> >       u8              type:4;
> >       /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
> >       u8              binding:4;
> > -     /** Set true for kernel symbols of idle routines. */
> > -     u8              idle:1;
> > +     /** Cache for symbol__is_idle. */
> > +     enum symbol_idle_kind idle:2;
> >       /** Resolvable but tools ignore it (e.g. idle routines). */
> >       u8              ignore:1;
> >       /** Symbol for an inlined function. */
> > @@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
> >
> >  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
> >
> > -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> > -                    bool kernel);
> > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
> >  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> > @@ -286,5 +292,6 @@ enum {
> >  };
> >
> >  int symbol__validate_sym_arguments(void);
> > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
> >
> >  #endif /* __PERF_SYMBOL */
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-03-26 15:11       ` Ian Rogers
@ 2026-03-26 17:45         ` Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                             ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Ian Rogers @ 2026-03-26 17:45 UTC (permalink / raw)
  To: irogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, namhyung, sumanthk, tmricht

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (2):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/env.c        | 179 +++++++++++++++++++++++++++--------
 tools/perf/util/env.h        |   1 +
 tools/perf/util/session.c    |  14 +--
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 105 ++++++++++++--------
 tools/perf/util/symbol.h     |  15 ++-
 7 files changed, 235 insertions(+), 87 deletions(-)

-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-03-26 17:45           ` Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-03-26 17:45 UTC (permalink / raw)
  To: irogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, namhyung, sumanthk, tmricht

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 179 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/session.c |  14 +--
 3 files changed, 151 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 93d475a80f14..304bd8245485 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,154 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"sa110", EM_ARM},
+	{"s390", EM_S390},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+    const char *search_key = key;
+    const struct arch_to_e_machine *map_element = element;
+    size_t prefix_len = strlen(map_element->prefix);
+
+    return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = EF_HOST;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch)
+		env->arch = strdup(e_machine_to_perf_arch(perf_env__e_machine(env, /*e_flags=*/NULL)));
 
-	return normalize_arch(arch_name);
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a4501cbca375..91ff252712f4 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b465abfa36c..dcc9bef303aa 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-03-26 17:45           ` Ian Rogers
  2026-03-27  6:56             ` Honglei Wang
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
  3 siblings, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-26 17:45 UTC (permalink / raw)
  To: irogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, namhyung, sumanthk, tmricht

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 84 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3cd4e5a03cc5..9fabf5146d89 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ce9195717f44..92bc28934f36 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
+
+		sscanf(release, "%d.%d", &major, &minor);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..65422c1c8fdb 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle. */
+	enum symbol_idle_kind idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-03-27  4:50           ` Ian Rogers
  2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
  3 siblings, 2 replies; 60+ messages in thread
From: Ian Rogers @ 2026-03-27  4:50 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
    https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (2):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/env.c        | 185 ++++++++++++++++++++++++++++-------
 tools/perf/util/env.h        |   1 +
 tools/perf/util/session.c    |  14 +--
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 104 +++++++++++++-------
 tools/perf/util/symbol.h     |  15 ++-
 7 files changed, 240 insertions(+), 87 deletions(-)

-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-03-27  4:50             ` Ian Rogers
  2026-04-06  5:05               ` Namhyung Kim
  2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  1 sibling, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-27  4:50 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/session.c |  14 +--
 3 files changed, 157 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 93d475a80f14..ae08178870d7 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch) {
+		/*
+		 * Lazily compute/allocate arch. The e_machine may have been
+		 * read from a data file and so may not be EM_HOST.
+		 */
+		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	return normalize_arch(arch_name);
+		env->arch = strdup(e_machine_to_perf_arch(e_machine));
+	}
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a4501cbca375..91ff252712f4 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b465abfa36c..dcc9bef303aa 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-03-27  4:50             ` Ian Rogers
  2026-04-06  5:10               ` Namhyung Kim
  1 sibling, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-27  4:50 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 104 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 83 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3cd4e5a03cc5..9fabf5146d89 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ce9195717f44..9ff709edeb88 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,86 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
+		    major < 6 || (major == 6 && minor < 10)) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +813,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..2f5f90f547aa 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                             ` (2 preceding siblings ...)
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-03-27  6:00           ` Ian Rogers
  2026-03-31  7:22             ` Namhyung Kim
  3 siblings, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-27  6:00 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Writing to the test output files in the current working directory can
fail in various contexts such as continual test. Other tests write to
a mktemp-ed file, make the "perf script task-analyszer tests" follow
this convention too. Currently this isn't possible for the perf.data
file due to a lack of perf script support, add a variable for when
this support is available.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/test_task_analyzer.sh | 38 +++++++++++---------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/tools/perf/tests/shell/test_task_analyzer.sh b/tools/perf/tests/shell/test_task_analyzer.sh
index e194fcf61df3..b1a6a7e017e4 100755
--- a/tools/perf/tests/shell/test_task_analyzer.sh
+++ b/tools/perf/tests/shell/test_task_analyzer.sh
@@ -3,6 +3,11 @@
 # SPDX-License-Identifier: GPL-2.0
 
 tmpdir=$(mktemp -d /tmp/perf-script-task-analyzer-XXXXX)
+# TODO: perf script report only supports input from the CWD perf.data file, make
+# it support input from any file.
+perfdata="perf.data"
+csv="$tmpdir/csv"
+csvsummary="$tmpdir/csvsummary"
 err=0
 
 # set PERF_EXEC_PATH to find scripts in the source directory
@@ -15,11 +20,10 @@ fi
 export ASAN_OPTIONS=detect_leaks=0
 
 cleanup() {
-  rm -f perf.data
-  rm -f perf.data.old
-  rm -f csv
-  rm -f csvsummary
+  rm -f "${perfdata}"
+  rm -f "${perfdata}".old
   rm -rf "$tmpdir"
+
   trap - exit term int
 }
 
@@ -61,7 +65,7 @@ skip_no_probe_record_support() {
 
 prepare_perf_data() {
 	# 1s should be sufficient to catch at least some switches
-	perf record -e sched:sched_switch -a -- sleep 1 > /dev/null 2>&1
+	perf record -e sched:sched_switch -a -o "${perfdata}" -- sleep 1 > /dev/null 2>&1
 	# check if perf data file got created in above step.
 	if [ ! -e "perf.data" ]; then
 		printf "FAIL: perf record failed to create \"perf.data\" \n"
@@ -130,28 +134,28 @@ test_extended_times_summary_ns() {
 }
 
 test_csv() {
-	perf script report task-analyzer --csv csv > /dev/null
-	check_exec_0 "perf script report task-analyzer --csv csv"
-	find_str_or_fail "Comm;" csv "${FUNCNAME[0]}"
+	perf script report task-analyzer --csv "${csv}" > /dev/null
+	check_exec_0 "perf script report task-analyzer --csv ${csv}"
+	find_str_or_fail "Comm;" "${csv}" "${FUNCNAME[0]}"
 }
 
 test_csv_extended_times() {
-	perf script report task-analyzer --csv csv --extended-times > /dev/null
-	check_exec_0 "perf script report task-analyzer --csv csv --extended-times"
-	find_str_or_fail "Out-Out;" csv "${FUNCNAME[0]}"
+	perf script report task-analyzer --csv "${csv}" --extended-times > /dev/null
+	check_exec_0 "perf script report task-analyzer --csv ${csv} --extended-times"
+	find_str_or_fail "Out-Out;" "${csv}" "${FUNCNAME[0]}"
 }
 
 test_csvsummary() {
-	perf script report task-analyzer --csv-summary csvsummary > /dev/null
-	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary"
-	find_str_or_fail "Comm;" csvsummary "${FUNCNAME[0]}"
+	perf script report task-analyzer --csv-summary "${csvsummary}" > /dev/null
+	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary}"
+	find_str_or_fail "Comm;" "${csvsummary}" "${FUNCNAME[0]}"
 }
 
 test_csvsummary_extended() {
-	perf script report task-analyzer --csv-summary csvsummary --summary-extended \
+	perf script report task-analyzer --csv-summary "${csvsummary}" --summary-extended \
 	>/dev/null
-	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary --summary-extended"
-	find_str_or_fail "Out-Out;" csvsummary "${FUNCNAME[0]}"
+	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary} --summary-extended"
+	find_str_or_fail "Out-Out;" "${csvsummary}" "${FUNCNAME[0]}"
 }
 
 skip_no_probe_record_support
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-03-27  6:56             ` Honglei Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Honglei Wang @ 2026-03-27  6:56 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, namhyung, sumanthk, tmricht

Hi Ian,

FYI. It works on my icx machine with 'perf top'.

Thanks,
Honglei

On 3/27/26 1:45 AM, Ian Rogers wrote:
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
> 
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> 
> The change switches x86 matches to use strstarts which means
> intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-top.c     |   6 +-
>  tools/perf/util/symbol-elf.c |   2 +-
>  tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
>  tools/perf/util/symbol.h     |  15 +++--
>  4 files changed, 84 insertions(+), 44 deletions(-)
> 
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 37950efb28ac..bdc1c761cd61 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  {
>  	struct perf_top *top = container_of(tool, struct perf_top, tool);
>  	struct addr_location al;
> +	struct dso *dso = NULL;
>  
>  	if (!machine && perf_guest) {
>  		static struct intlist *seen;
> @@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  		}
>  	}
>  
> -	if (al.sym == NULL || !al.sym->idle) {
> +	if (al.map)
> +		dso = map__dso(al.map);
> +
> +	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
>  		struct hists *hists = evsel__hists(evsel);
>  		struct hist_entry_iter iter = {
>  			.evsel		= evsel,
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 3cd4e5a03cc5..9fabf5146d89 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
>  
>  		arch__sym_update(f, &sym);
>  
> -		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> +		__symbols__insert(dso__symbols(curr_dso), f);
>  		nr++;
>  	}
>  	dso__put(curr_dso);
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index ce9195717f44..92bc28934f36 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -25,6 +25,8 @@
>  #include "demangle-ocaml.h"
>  #include "demangle-rust-v0.h"
>  #include "dso.h"
> +#include "dwarf-regs.h"
> +#include "env.h"
>  #include "util.h" // lsdir()
>  #include "event.h"
>  #include "machine.h"
> @@ -50,7 +52,6 @@
>  
>  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> -static bool symbol__is_idle(const char *name);
>  
>  int vmlinux_path__nr_entries;
>  char **vmlinux_path;
> @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
>  	}
>  }
>  
> -void __symbols__insert(struct rb_root_cached *symbols,
> -		       struct symbol *sym, bool kernel)
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
>  	struct rb_node **p = &symbols->rb_root.rb_node;
>  	struct rb_node *parent = NULL;
> @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  	struct symbol *s;
>  	bool leftmost = true;
>  
> -	if (kernel) {
> -		const char *name = sym->name;
> -		/*
> -		 * ppc64 uses function descriptors and appends a '.' to the
> -		 * start of every instruction address. Remove it.
> -		 */
> -		if (name[0] == '.')
> -			name++;
> -		sym->idle = symbol__is_idle(name);
> -	}
> -
>  	while (*p != NULL) {
>  		parent = *p;
>  		s = rb_entry(parent, struct symbol, rb_node);
> @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
> -	__symbols__insert(symbols, sym, false);
> +	__symbols__insert(symbols, sym);
>  }
>  
>  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
>  
>  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
>  {
> -	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> +	__symbols__insert(dso__symbols(dso), sym);
>  
>  	/* update the symbol cache if necessary */
>  	if (dso__last_find_result_addr(dso) >= sym->start &&
> @@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
>  	return err;
>  }
>  
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +	const char *name = a;
> +	const char *const *sym = b;
> +
> +	return strcmp(name, *sym);
> +}
> +
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__is_idle(const char *name)
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
>  {
> -	const char * const idle_symbols[] = {
> +	static const char * const idle_symbols[] = {
>  		"acpi_idle_do_entry",
>  		"acpi_processor_ffh_cstate_enter",
>  		"arch_cpu_idle",
>  		"cpu_idle",
>  		"cpu_startup_entry",
> -		"idle_cpu",
> -		"intel_idle",
> -		"intel_idle_ibrs",
>  		"default_idle",
> -		"native_safe_halt",
>  		"enter_idle",
>  		"exit_idle",
> -		"mwait_idle",
> -		"mwait_idle_with_hints",
> -		"mwait_idle_with_hints.constprop.0",
> +		"idle_cpu",
> +		"native_safe_halt",
>  		"poll_idle",
> -		"ppc64_runlatch_off",
>  		"pseries_dedicated_idle_sleep",
> -		"psw_idle",
> -		"psw_idle_exit",
> -		NULL
>  	};
> -	int i;
> -	static struct strlist *idle_symbols_list;
> +	const char *name = sym->name;
> +	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
>  
> -	if (idle_symbols_list)
> -		return strlist__has_entry(idle_symbols_list, name);
> +	if (sym->idle)
> +		return sym->idle == SYMBOL_IDLE__IDLE;
>  
> -	idle_symbols_list = strlist__new(NULL, NULL);
> +	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +		sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +		return false;
> +	}
>  
> -	for (i = 0; idle_symbols[i]; i++)
> -		strlist__add(idle_symbols_list, idle_symbols[i]);
> +	/*
> +	 * ppc64 uses function descriptors and appends a '.' to the
> +	 * start of every instruction address. Remove it.
> +	 */
> +	if (name[0] == '.')
> +		name++;
>  
> -	return strlist__has_entry(idle_symbols_list, name);
> +	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +		    sizeof(idle_symbols[0]), sym_name_cmp)) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_386 || e_machine == EM_X86_64) {
> +		if (strstarts(name, "mwait_idle") ||
> +		    strstarts(name, "intel_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_S390) {
> +		int major = 0, minor = 0;
> +		const char *release = env && env->os_release
> +			? env->os_release : perf_version_string;
> +
> +		sscanf(release, "%d.%d", &major, &minor);
> +
> +		/* Before v6.10, s390 used psw_idle. */
> +		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +	return false;
>  }
>  
>  static int map__process_kallsym_symbol(void *arg, const char *name,
> @@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
>  	 * We will pass the symbols to the filter later, in
>  	 * map__split_kallsyms, when we have split the maps per module
>  	 */
> -	__symbols__insert(root, sym, !strchr(name, '['));
> +	__symbols__insert(root, sym);
>  
>  	return 0;
>  }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index c67814d6d6d6..65422c1c8fdb 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -25,6 +25,7 @@ struct dso;
>  struct map;
>  struct maps;
>  struct option;
> +struct perf_env;
>  struct build_id;
>  
>  /*
> @@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>  			     GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>  
> +enum symbol_idle_kind {
> +	SYMBOL_IDLE__UNKNOWN = 0,
> +	SYMBOL_IDLE__NOT_IDLE = 1,
> +	SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -57,8 +64,8 @@ struct symbol {
>  	u8		type:4;
>  	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>  	u8		binding:4;
> -	/** Set true for kernel symbols of idle routines. */
> -	u8		idle:1;
> +	/** Cache for symbol__is_idle. */
> +	enum symbol_idle_kind idle:2;
>  	/** Resolvable but tools ignore it (e.g. idle routines). */
>  	u8		ignore:1;
>  	/** Symbol for an inlined function. */
> @@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
>  
>  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>  
> -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> -		       bool kernel);
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
>  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> @@ -286,5 +292,6 @@ enum {
>  };
>  
>  int symbol__validate_sym_arguments(void);
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
>  
>  #endif /* __PERF_SYMBOL */


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
@ 2026-03-31  7:22             ` Namhyung Kim
  2026-03-31 17:58               ` Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Namhyung Kim @ 2026-03-31  7:22 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

I'm curious why this patch is in the idle symbol thread.


On Thu, Mar 26, 2026 at 11:00:33PM -0700, Ian Rogers wrote:
> Writing to the test output files in the current working directory can
> fail in various contexts such as continual test. Other tests write to
> a mktemp-ed file, make the "perf script task-analyszer tests" follow
> this convention too. Currently this isn't possible for the perf.data
> file due to a lack of perf script support, add a variable for when
> this support is available.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/tests/shell/test_task_analyzer.sh | 38 +++++++++++---------
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/test_task_analyzer.sh b/tools/perf/tests/shell/test_task_analyzer.sh
> index e194fcf61df3..b1a6a7e017e4 100755
> --- a/tools/perf/tests/shell/test_task_analyzer.sh
> +++ b/tools/perf/tests/shell/test_task_analyzer.sh
> @@ -3,6 +3,11 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
>  tmpdir=$(mktemp -d /tmp/perf-script-task-analyzer-XXXXX)
> +# TODO: perf script report only supports input from the CWD perf.data file, make
> +# it support input from any file.
> +perfdata="perf.data"
> +csv="$tmpdir/csv"
> +csvsummary="$tmpdir/csvsummary"
>  err=0
>  
>  # set PERF_EXEC_PATH to find scripts in the source directory
> @@ -15,11 +20,10 @@ fi
>  export ASAN_OPTIONS=detect_leaks=0
>  
>  cleanup() {
> -  rm -f perf.data
> -  rm -f perf.data.old
> -  rm -f csv
> -  rm -f csvsummary
> +  rm -f "${perfdata}"
> +  rm -f "${perfdata}".old
>    rm -rf "$tmpdir"
> +
>    trap - exit term int
>  }
>  
> @@ -61,7 +65,7 @@ skip_no_probe_record_support() {
>  
>  prepare_perf_data() {
>  	# 1s should be sufficient to catch at least some switches
> -	perf record -e sched:sched_switch -a -- sleep 1 > /dev/null 2>&1
> +	perf record -e sched:sched_switch -a -o "${perfdata}" -- sleep 1 > /dev/null 2>&1
>  	# check if perf data file got created in above step.
>  	if [ ! -e "perf.data" ]; then
>  		printf "FAIL: perf record failed to create \"perf.data\" \n"

Please update this part too.

Thanks,
Namhyung


> @@ -130,28 +134,28 @@ test_extended_times_summary_ns() {
>  }
>  
>  test_csv() {
> -	perf script report task-analyzer --csv csv > /dev/null
> -	check_exec_0 "perf script report task-analyzer --csv csv"
> -	find_str_or_fail "Comm;" csv "${FUNCNAME[0]}"
> +	perf script report task-analyzer --csv "${csv}" > /dev/null
> +	check_exec_0 "perf script report task-analyzer --csv ${csv}"
> +	find_str_or_fail "Comm;" "${csv}" "${FUNCNAME[0]}"
>  }
>  
>  test_csv_extended_times() {
> -	perf script report task-analyzer --csv csv --extended-times > /dev/null
> -	check_exec_0 "perf script report task-analyzer --csv csv --extended-times"
> -	find_str_or_fail "Out-Out;" csv "${FUNCNAME[0]}"
> +	perf script report task-analyzer --csv "${csv}" --extended-times > /dev/null
> +	check_exec_0 "perf script report task-analyzer --csv ${csv} --extended-times"
> +	find_str_or_fail "Out-Out;" "${csv}" "${FUNCNAME[0]}"
>  }
>  
>  test_csvsummary() {
> -	perf script report task-analyzer --csv-summary csvsummary > /dev/null
> -	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary"
> -	find_str_or_fail "Comm;" csvsummary "${FUNCNAME[0]}"
> +	perf script report task-analyzer --csv-summary "${csvsummary}" > /dev/null
> +	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary}"
> +	find_str_or_fail "Comm;" "${csvsummary}" "${FUNCNAME[0]}"
>  }
>  
>  test_csvsummary_extended() {
> -	perf script report task-analyzer --csv-summary csvsummary --summary-extended \
> +	perf script report task-analyzer --csv-summary "${csvsummary}" --summary-extended \
>  	>/dev/null
> -	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary --summary-extended"
> -	find_str_or_fail "Out-Out;" csvsummary "${FUNCNAME[0]}"
> +	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary} --summary-extended"
> +	find_str_or_fail "Out-Out;" "${csvsummary}" "${FUNCNAME[0]}"
>  }
>  
>  skip_no_probe_record_support
> -- 
> 2.53.0.1018.g2bb0e51243-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-31  7:22             ` Namhyung Kim
@ 2026-03-31 17:58               ` Ian Rogers
  2026-04-01  3:41                 ` Namhyung Kim
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-03-31 17:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

On Tue, Mar 31, 2026 at 12:22 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> I'm curious why this patch is in the idle symbol thread.

I'll separate it, I was gathering fixes. Same branch has the BPF
counters test fix in it:
https://lore.kernel.org/lkml/20260325171653.1091337-1-irogers@google.com/

> On Thu, Mar 26, 2026 at 11:00:33PM -0700, Ian Rogers wrote:
> > Writing to the test output files in the current working directory can
> > fail in various contexts such as continual test. Other tests write to
> > a mktemp-ed file, make the "perf script task-analyszer tests" follow
> > this convention too. Currently this isn't possible for the perf.data
> > file due to a lack of perf script support, add a variable for when
> > this support is available.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/tests/shell/test_task_analyzer.sh | 38 +++++++++++---------
> >  1 file changed, 21 insertions(+), 17 deletions(-)
> >
> > diff --git a/tools/perf/tests/shell/test_task_analyzer.sh b/tools/perf/tests/shell/test_task_analyzer.sh
> > index e194fcf61df3..b1a6a7e017e4 100755
> > --- a/tools/perf/tests/shell/test_task_analyzer.sh
> > +++ b/tools/perf/tests/shell/test_task_analyzer.sh
> > @@ -3,6 +3,11 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >
> >  tmpdir=$(mktemp -d /tmp/perf-script-task-analyzer-XXXXX)
> > +# TODO: perf script report only supports input from the CWD perf.data file, make
> > +# it support input from any file.
> > +perfdata="perf.data"
> > +csv="$tmpdir/csv"
> > +csvsummary="$tmpdir/csvsummary"
> >  err=0
> >
> >  # set PERF_EXEC_PATH to find scripts in the source directory
> > @@ -15,11 +20,10 @@ fi
> >  export ASAN_OPTIONS=detect_leaks=0
> >
> >  cleanup() {
> > -  rm -f perf.data
> > -  rm -f perf.data.old
> > -  rm -f csv
> > -  rm -f csvsummary
> > +  rm -f "${perfdata}"
> > +  rm -f "${perfdata}".old
> >    rm -rf "$tmpdir"
> > +
> >    trap - exit term int
> >  }
> >
> > @@ -61,7 +65,7 @@ skip_no_probe_record_support() {
> >
> >  prepare_perf_data() {
> >       # 1s should be sufficient to catch at least some switches
> > -     perf record -e sched:sched_switch -a -- sleep 1 > /dev/null 2>&1
> > +     perf record -e sched:sched_switch -a -o "${perfdata}" -- sleep 1 > /dev/null 2>&1
> >       # check if perf data file got created in above step.
> >       if [ ! -e "perf.data" ]; then
> >               printf "FAIL: perf record failed to create \"perf.data\" \n"
>
> Please update this part too.

Done.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > @@ -130,28 +134,28 @@ test_extended_times_summary_ns() {
> >  }
> >
> >  test_csv() {
> > -     perf script report task-analyzer --csv csv > /dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv csv"
> > -     find_str_or_fail "Comm;" csv "${FUNCNAME[0]}"
> > +     perf script report task-analyzer --csv "${csv}" > /dev/null
> > +     check_exec_0 "perf script report task-analyzer --csv ${csv}"
> > +     find_str_or_fail "Comm;" "${csv}" "${FUNCNAME[0]}"
> >  }
> >
> >  test_csv_extended_times() {
> > -     perf script report task-analyzer --csv csv --extended-times > /dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv csv --extended-times"
> > -     find_str_or_fail "Out-Out;" csv "${FUNCNAME[0]}"
> > +     perf script report task-analyzer --csv "${csv}" --extended-times > /dev/null
> > +     check_exec_0 "perf script report task-analyzer --csv ${csv} --extended-times"
> > +     find_str_or_fail "Out-Out;" "${csv}" "${FUNCNAME[0]}"
> >  }
> >
> >  test_csvsummary() {
> > -     perf script report task-analyzer --csv-summary csvsummary > /dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv-summary csvsummary"
> > -     find_str_or_fail "Comm;" csvsummary "${FUNCNAME[0]}"
> > +     perf script report task-analyzer --csv-summary "${csvsummary}" > /dev/null
> > +     check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary}"
> > +     find_str_or_fail "Comm;" "${csvsummary}" "${FUNCNAME[0]}"
> >  }
> >
> >  test_csvsummary_extended() {
> > -     perf script report task-analyzer --csv-summary csvsummary --summary-extended \
> > +     perf script report task-analyzer --csv-summary "${csvsummary}" --summary-extended \
> >       >/dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv-summary csvsummary --summary-extended"
> > -     find_str_or_fail "Out-Out;" csvsummary "${FUNCNAME[0]}"
> > +     check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary} --summary-extended"
> > +     find_str_or_fail "Out-Out;" "${csvsummary}" "${FUNCNAME[0]}"
> >  }
> >
> >  skip_no_probe_record_support
> > --
> > 2.53.0.1018.g2bb0e51243-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-31 17:58               ` Ian Rogers
@ 2026-04-01  3:41                 ` Namhyung Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Namhyung Kim @ 2026-04-01  3:41 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

On Tue, Mar 31, 2026 at 10:58:55AM -0700, Ian Rogers wrote:
> On Tue, Mar 31, 2026 at 12:22 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > I'm curious why this patch is in the idle symbol thread.
> 
> I'll separate it, I was gathering fixes. Same branch has the BPF
> counters test fix in it:
> https://lore.kernel.org/lkml/20260325171653.1091337-1-irogers@google.com/

Ok, I'll test and process it.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-06  5:05               ` Namhyung Kim
  2026-04-06 15:36                 ` Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Namhyung Kim @ 2026-04-06  5:05 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Thu, Mar 26, 2026 at 09:50:24PM -0700, Ian Rogers wrote:
> Add a helper that lazily computes the e_machine and falls back of
> EM_HOST. Use the perf_env's arch to compute the e_machine if
> available. Use a binary search for some efficiency in this, but handle
> somewhat complex duplicate rules. Switch perf_env__arch to be derived
> the e_machine for consistency. This switches arch from being uname
> derived to matching that of the perf binary (via EM_HOST). Update
> session to use the helper, which may mean using EM_HOST when no
> threads are available. This also updates the perf data file header
> that gets the e_machine/e_flags from the session.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
>  tools/perf/util/env.h     |   1 +
>  tools/perf/util/session.c |  14 +--
>  3 files changed, 157 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 93d475a80f14..ae08178870d7 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -1,10 +1,12 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "cpumap.h"
> +#include "dwarf-regs.h"
>  #include "debug.h"
>  #include "env.h"
>  #include "util/header.h"
>  #include "util/rwsem.h"
>  #include <linux/compiler.h>
> +#include <linux/kernel.h>
>  #include <linux/ctype.h>
>  #include <linux/rbtree.h>
>  #include <linux/string.h>
> @@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
>  	zfree(&cache->size);
>  }
>  
> +struct arch_to_e_machine {
> +	const char *prefix;
> +	uint16_t e_machine;
> +};
> +
>  /*
> - * Return architecture name in a normalized form.
> - * The conversion logic comes from the Makefile.
> + * A mapping from an arch prefix string to an ELF machine that can be used in a
> + * bsearch. Some arch prefixes are shared an need additional processing as
> + * marked next to the architecture. The prefixes handle both perf's architecture
> + * naming and those from uname.
>   */
> -static const char *normalize_arch(char *arch)
> -{
> -	if (!strcmp(arch, "x86_64"))
> -		return "x86";
> -	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
> -		return "x86";
> -	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
> -		return "sparc";
> -	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
> -		return "arm64";
> -	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
> -		return "arm";
> -	if (!strncmp(arch, "s390", 4))
> -		return "s390";
> -	if (!strncmp(arch, "parisc", 6))
> -		return "parisc";
> -	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
> -		return "powerpc";
> -	if (!strncmp(arch, "mips", 4))
> -		return "mips";
> -	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
> -		return "sh";
> -	if (!strncmp(arch, "loongarch", 9))
> -		return "loongarch";
> -
> -	return arch;
> +static const struct arch_to_e_machine prefix_to_e_machine[] = {
> +	{"aarch64", EM_AARCH64},
> +	{"alpha", EM_ALPHA},
> +	{"arc", EM_ARC},
> +	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
> +	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
> +	{"bfin", EM_BLACKFIN},
> +	{"blackfin", EM_BLACKFIN},
> +	{"cris", EM_CRIS},
> +	{"csky", EM_CSKY},
> +	{"hppa", EM_PARISC},
> +	{"i386", EM_386},
> +	{"i486", EM_386},
> +	{"i586", EM_386},
> +	{"i686", EM_386},
> +	{"loongarch", EM_LOONGARCH},
> +	{"m32r", EM_M32R},
> +	{"m68k", EM_68K},
> +	{"microblaze", EM_MICROBLAZE},
> +	{"mips", EM_MIPS},
> +	{"msp430", EM_MSP430},
> +	{"parisc", EM_PARISC},
> +	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
> +	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
> +	{"riscv", EM_RISCV},
> +	{"s390", EM_S390},
> +	{"sa110", EM_ARM},
> +	{"sh", EM_SH},
> +	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
> +	{"sun4u", EM_SPARC},
> +	{"x86", EM_X86_64}, /* Check also for EM_386. */
> +	{"xtensa", EM_XTENSA},
> +};
> +
> +static int compare_prefix(const void *key, const void *element)
> +{
> +	const char *search_key = key;
> +	const struct arch_to_e_machine *map_element = element;
> +	size_t prefix_len = strlen(map_element->prefix);
> +
> +	return strncmp(search_key, map_element->prefix, prefix_len);
> +}
> +
> +static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
> +{
> +	/* Binary search for a matching prefix. */
> +	const struct arch_to_e_machine *result;
> +
> +	if (!perf_arch)
> +		return EM_HOST;
> +
> +	result = bsearch(perf_arch,
> +			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
> +			 sizeof(prefix_to_e_machine[0]),
> +			 compare_prefix);
> +
> +	if (!result) {
> +		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
> +		return EM_NONE;
> +	}
> +
> +	/* Handle conflicting prefixes. */
> +	switch (result->e_machine) {
> +	case EM_ARM:
> +		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
> +	case EM_AVR:
> +		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
> +	case EM_PPC:
> +		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;

I'm curious what's the name `uname -m` returns for PPC64.  Is
"powerpc64" possible?


> +	case EM_SPARC:
> +		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
> +	case EM_X86_64:
> +		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
> +	default:
> +		return result->e_machine;
> +	}
> +}
> +
> +static const char *e_machine_to_perf_arch(uint16_t e_machine)
> +{
> +	/*
> +	 * Table for if either the perf arch string differs from uname or there
> +	 * are >1 ELF machine with the prefix.
> +	 */
> +	static const struct arch_to_e_machine extras[] = {
> +		{"arm64", EM_AARCH64},
> +		{"avr32", EM_AVR32},
> +		{"powerpc", EM_PPC},
> +		{"powerpc", EM_PPC64},

Here it returns powerpc for both.


> +		{"sparc", EM_SPARCV9},
> +		{"x86", EM_386},
> +		{"x86", EM_X86_64},
> +		{"none", EM_NONE},
> +	};
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
> +		if (extras[i].e_machine == e_machine)
> +			return extras[i].prefix;
> +	}
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
> +		if (prefix_to_e_machine[i].e_machine == e_machine)
> +			return prefix_to_e_machine[i].prefix;
> +
> +	}
> +	return "unknown";
> +}
> +
> +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
> +{
> +	if (!env) {
> +		if (e_flags)
> +			*e_flags = EF_HOST;
> +
> +		return EM_HOST;
> +	}
> +	if (env->e_machine == EM_NONE) {
> +		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
> +
> +		if (env->e_machine == EM_HOST)
> +			env->e_flags = EF_HOST;
> +	}
> +	if (e_flags)
> +		*e_flags = env->e_flags;
> +
> +	return env->e_machine;
>  }
>  
>  const char *perf_env__arch(struct perf_env *env)
>  {
> -	char *arch_name;
> +	if (!env)
> +		return e_machine_to_perf_arch(EM_HOST);
>  
> -	if (!env || !env->arch) { /* Assume local operation */
> -		static struct utsname uts = { .machine[0] = '\0', };
> -		if (uts.machine[0] == '\0' && uname(&uts) < 0)
> -			return NULL;
> -		arch_name = uts.machine;
> -	} else
> -		arch_name = env->arch;
> +	if (!env->arch) {
> +		/*
> +		 * Lazily compute/allocate arch. The e_machine may have been
> +		 * read from a data file and so may not be EM_HOST.
> +		 */
> +		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
>  
> -	return normalize_arch(arch_name);
> +		env->arch = strdup(e_machine_to_perf_arch(e_machine));
> +	}
> +	return env->arch;
>  }
>  
>  #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index a4501cbca375..91ff252712f4 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
>  
>  void cpu_cache_level__free(struct cpu_cache_level *cache);
>  
> +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
>  const char *perf_env__arch(struct perf_env *env);
>  const char *perf_env__arch_strerrno(struct perf_env *env, int err);
>  const char *perf_env__cpuid(struct perf_env *env);
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4b465abfa36c..dcc9bef303aa 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
>  		return EM_HOST;
>  	}
>  
> +	/* Is the env caching an e_machine? */
>  	env = perf_session__env(session);
> -	if (env && env->e_machine != EM_NONE) {
> -		if (e_flags)
> -			*e_flags = env->e_flags;
> -
> -		return env->e_machine;
> -	}
> +	if (env && env->e_machine != EM_NONE)
> +		return perf_env__e_machine(env, e_flags);
>  
> +	/*
> +	 * Compute from threads, note this is more accurate than
> +	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
> +	 * mixed 32-bit and 64-bit threads.
> +	 */
>  	machines__for_each_thread(&session->machines,
>  				  perf_session__e_machine_cb,
>  				  &args);
> -- 
> 2.53.0.1018.g2bb0e51243-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-04-06  5:10               ` Namhyung Kim
  2026-04-06 16:11                 ` Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Namhyung Kim @ 2026-04-06  5:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Thu, Mar 26, 2026 at 09:50:25PM -0700, Ian Rogers wrote:
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
> 
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> 
> The change switches x86 matches to use strstarts which means
> intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
[SNIP]
> +	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
> +		int major = 0, minor = 0;
> +		const char *release = env && env->os_release
> +			? env->os_release : perf_version_string;

I think Sashiko's review is right.  You need to check the kernel version
instead of perf.

Thanks,
Namhyung

> +
> +		/* Before v6.10, s390 used psw_idle. */
> +		if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
> +		    major < 6 || (major == 6 && minor < 10)) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +	return false;
>  }

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-06  5:05               ` Namhyung Kim
@ 2026-04-06 15:36                 ` Ian Rogers
  0 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-06 15:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Sun, Apr 5, 2026 at 10:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Mar 26, 2026 at 09:50:24PM -0700, Ian Rogers wrote:
> > Add a helper that lazily computes the e_machine and falls back of
> > EM_HOST. Use the perf_env's arch to compute the e_machine if
> > available. Use a binary search for some efficiency in this, but handle
> > somewhat complex duplicate rules. Switch perf_env__arch to be derived
> > the e_machine for consistency. This switches arch from being uname
> > derived to matching that of the perf binary (via EM_HOST). Update
> > session to use the helper, which may mean using EM_HOST when no
> > threads are available. This also updates the perf data file header
> > that gets the e_machine/e_flags from the session.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
> >  tools/perf/util/env.h     |   1 +
> >  tools/perf/util/session.c |  14 +--
> >  3 files changed, 157 insertions(+), 43 deletions(-)
> >
> > diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> > index 93d475a80f14..ae08178870d7 100644
> > --- a/tools/perf/util/env.c
> > +++ b/tools/perf/util/env.c
> > @@ -1,10 +1,12 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include "cpumap.h"
> > +#include "dwarf-regs.h"
> >  #include "debug.h"
> >  #include "env.h"
> >  #include "util/header.h"
> >  #include "util/rwsem.h"
> >  #include <linux/compiler.h>
> > +#include <linux/kernel.h>
> >  #include <linux/ctype.h>
> >  #include <linux/rbtree.h>
> >  #include <linux/string.h>
> > @@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
> >       zfree(&cache->size);
> >  }
> >
> > +struct arch_to_e_machine {
> > +     const char *prefix;
> > +     uint16_t e_machine;
> > +};
> > +
> >  /*
> > - * Return architecture name in a normalized form.
> > - * The conversion logic comes from the Makefile.
> > + * A mapping from an arch prefix string to an ELF machine that can be used in a
> > + * bsearch. Some arch prefixes are shared an need additional processing as
> > + * marked next to the architecture. The prefixes handle both perf's architecture
> > + * naming and those from uname.
> >   */
> > -static const char *normalize_arch(char *arch)
> > -{
> > -     if (!strcmp(arch, "x86_64"))
> > -             return "x86";
> > -     if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
> > -             return "x86";
> > -     if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
> > -             return "sparc";
> > -     if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
> > -             return "arm64";
> > -     if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
> > -             return "arm";
> > -     if (!strncmp(arch, "s390", 4))
> > -             return "s390";
> > -     if (!strncmp(arch, "parisc", 6))
> > -             return "parisc";
> > -     if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
> > -             return "powerpc";
> > -     if (!strncmp(arch, "mips", 4))
> > -             return "mips";
> > -     if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
> > -             return "sh";
> > -     if (!strncmp(arch, "loongarch", 9))
> > -             return "loongarch";
> > -
> > -     return arch;
> > +static const struct arch_to_e_machine prefix_to_e_machine[] = {
> > +     {"aarch64", EM_AARCH64},
> > +     {"alpha", EM_ALPHA},
> > +     {"arc", EM_ARC},
> > +     {"arm", EM_ARM}, /* Check also for EM_AARCH64. */
> > +     {"avr", EM_AVR},  /* Check also for EM_AVR32. */
> > +     {"bfin", EM_BLACKFIN},
> > +     {"blackfin", EM_BLACKFIN},
> > +     {"cris", EM_CRIS},
> > +     {"csky", EM_CSKY},
> > +     {"hppa", EM_PARISC},
> > +     {"i386", EM_386},
> > +     {"i486", EM_386},
> > +     {"i586", EM_386},
> > +     {"i686", EM_386},
> > +     {"loongarch", EM_LOONGARCH},
> > +     {"m32r", EM_M32R},
> > +     {"m68k", EM_68K},
> > +     {"microblaze", EM_MICROBLAZE},
> > +     {"mips", EM_MIPS},
> > +     {"msp430", EM_MSP430},
> > +     {"parisc", EM_PARISC},
> > +     {"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
> > +     {"ppc", EM_PPC}, /* Check also for EM_PPC64. */
> > +     {"riscv", EM_RISCV},
> > +     {"s390", EM_S390},
> > +     {"sa110", EM_ARM},
> > +     {"sh", EM_SH},
> > +     {"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
> > +     {"sun4u", EM_SPARC},
> > +     {"x86", EM_X86_64}, /* Check also for EM_386. */
> > +     {"xtensa", EM_XTENSA},
> > +};
> > +
> > +static int compare_prefix(const void *key, const void *element)
> > +{
> > +     const char *search_key = key;
> > +     const struct arch_to_e_machine *map_element = element;
> > +     size_t prefix_len = strlen(map_element->prefix);
> > +
> > +     return strncmp(search_key, map_element->prefix, prefix_len);
> > +}
> > +
> > +static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
> > +{
> > +     /* Binary search for a matching prefix. */
> > +     const struct arch_to_e_machine *result;
> > +
> > +     if (!perf_arch)
> > +             return EM_HOST;
> > +
> > +     result = bsearch(perf_arch,
> > +                      prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
> > +                      sizeof(prefix_to_e_machine[0]),
> > +                      compare_prefix);
> > +
> > +     if (!result) {
> > +             pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
> > +             return EM_NONE;
> > +     }
> > +
> > +     /* Handle conflicting prefixes. */
> > +     switch (result->e_machine) {
> > +     case EM_ARM:
> > +             return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
> > +     case EM_AVR:
> > +             return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
> > +     case EM_PPC:
> > +             return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
>
> I'm curious what's the name `uname -m` returns for PPC64.  Is
> "powerpc64" possible?

It is.

> > +     case EM_SPARC:
> > +             return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
> > +     case EM_X86_64:
> > +             return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
> > +     default:
> > +             return result->e_machine;
> > +     }
> > +}
> > +
> > +static const char *e_machine_to_perf_arch(uint16_t e_machine)
> > +{
> > +     /*
> > +      * Table for if either the perf arch string differs from uname or there
> > +      * are >1 ELF machine with the prefix.
> > +      */
> > +     static const struct arch_to_e_machine extras[] = {
> > +             {"arm64", EM_AARCH64},
> > +             {"avr32", EM_AVR32},
> > +             {"powerpc", EM_PPC},
> > +             {"powerpc", EM_PPC64},
>
> Here it returns powerpc for both.

Yep. This is 100% intentional as the existing code does the same:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.c?h=perf-tools-next#n611
```
static const char *normalize_arch(char *arch)
...
if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
return "powerpc";
```
The strncmp is limited to just the prefix of the uname string,
ignoring the 64. So the arch "powerpc" can be 32-bit or 64-bit, just
as "x86" can be 32-bit or 64-bit. To determine which case applies, the
code should really check `struct perf_env`'s `kernel_is_64_bit`. I
think this is generally much more painful than just using the
e_machine - especially since you need to strcmp the name. For the
e_machine, the problem is that on x86 we have 32-bit, x32 and x86_64.
There is then also an ABI question regarding the use of SIMD registers
and the newer APX registers. If there are no samples and no DSOs in
play, making a choice of e_machine to set up variables with is
somewhat arbitrary. I think EM_HOST, the e_machine of the current perf
binary, is a good choice.

Thanks,
Ian

> > +             {"sparc", EM_SPARCV9},
> > +             {"x86", EM_386},
> > +             {"x86", EM_X86_64},
> > +             {"none", EM_NONE},
> > +     };
> > +
> > +     for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
> > +             if (extras[i].e_machine == e_machine)
> > +                     return extras[i].prefix;
> > +     }
> > +
> > +     for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
> > +             if (prefix_to_e_machine[i].e_machine == e_machine)
> > +                     return prefix_to_e_machine[i].prefix;
> > +
> > +     }
> > +     return "unknown";
> > +}
> > +
> > +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
> > +{
> > +     if (!env) {
> > +             if (e_flags)
> > +                     *e_flags = EF_HOST;
> > +
> > +             return EM_HOST;
> > +     }
> > +     if (env->e_machine == EM_NONE) {
> > +             env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
> > +
> > +             if (env->e_machine == EM_HOST)
> > +                     env->e_flags = EF_HOST;
> > +     }
> > +     if (e_flags)
> > +             *e_flags = env->e_flags;
> > +
> > +     return env->e_machine;
> >  }
> >
> >  const char *perf_env__arch(struct perf_env *env)
> >  {
> > -     char *arch_name;
> > +     if (!env)
> > +             return e_machine_to_perf_arch(EM_HOST);
> >
> > -     if (!env || !env->arch) { /* Assume local operation */
> > -             static struct utsname uts = { .machine[0] = '\0', };
> > -             if (uts.machine[0] == '\0' && uname(&uts) < 0)
> > -                     return NULL;
> > -             arch_name = uts.machine;
> > -     } else
> > -             arch_name = env->arch;
> > +     if (!env->arch) {
> > +             /*
> > +              * Lazily compute/allocate arch. The e_machine may have been
> > +              * read from a data file and so may not be EM_HOST.
> > +              */
> > +             uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
> >
> > -     return normalize_arch(arch_name);
> > +             env->arch = strdup(e_machine_to_perf_arch(e_machine));
> > +     }
> > +     return env->arch;
> >  }
> >
> >  #if defined(HAVE_LIBTRACEEVENT)
> > diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> > index a4501cbca375..91ff252712f4 100644
> > --- a/tools/perf/util/env.h
> > +++ b/tools/perf/util/env.h
> > @@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
> >
> >  void cpu_cache_level__free(struct cpu_cache_level *cache);
> >
> > +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
> >  const char *perf_env__arch(struct perf_env *env);
> >  const char *perf_env__arch_strerrno(struct perf_env *env, int err);
> >  const char *perf_env__cpuid(struct perf_env *env);
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 4b465abfa36c..dcc9bef303aa 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
> >               return EM_HOST;
> >       }
> >
> > +     /* Is the env caching an e_machine? */
> >       env = perf_session__env(session);
> > -     if (env && env->e_machine != EM_NONE) {
> > -             if (e_flags)
> > -                     *e_flags = env->e_flags;
> > -
> > -             return env->e_machine;
> > -     }
> > +     if (env && env->e_machine != EM_NONE)
> > +             return perf_env__e_machine(env, e_flags);
> >
> > +     /*
> > +      * Compute from threads, note this is more accurate than
> > +      * perf_env__e_machine that falls back on EM_HOST and doesn't consider
> > +      * mixed 32-bit and 64-bit threads.
> > +      */
> >       machines__for_each_thread(&session->machines,
> >                                 perf_session__e_machine_cb,
> >                                 &args);
> > --
> > 2.53.0.1018.g2bb0e51243-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-04-06  5:10               ` Namhyung Kim
@ 2026-04-06 16:11                 ` Ian Rogers
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Rogers @ 2026-04-06 16:11 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Sun, Apr 5, 2026 at 10:10 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Mar 26, 2026 at 09:50:25PM -0700, Ian Rogers wrote:
> > Move the idle boolean to a helper symbol__is_idle function. In the
> > function lazily compute whether a symbol is an idle function taking
> > into consideration the kernel version and architecture of the
> > machine. As symbols__insert no longer needs to know if a symbol is for
> > the kernel, remove the argument.
> >
> > This change is inspired by mailing list discussion, particularly from
> > Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> > <hca@linux.ibm.com>:
> > https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> >
> > The change switches x86 matches to use strstarts which means
> > intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> > change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> > https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> [SNIP]
> > +     if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
> > +             int major = 0, minor = 0;
> > +             const char *release = env && env->os_release
> > +                     ? env->os_release : perf_version_string;
>
> I think Sashiko's review is right.  You need to check the kernel version
> instead of perf.

Doing this can create more problems and complexity than it solves. If
we state that `os_release` can be NULL at this point, we recompute it
using `uname`:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/header.c?h=perf-tools-next#n378
then do we cache the value in env? What happens if a data/pipe file
that assigns to the env later? Ad-hoc users of env->os_release
recomputing it shouldn't happen; instead, in 'live' mode, we should
assign os_release using uname either when the perf_env is created or
lazily with a helper function. I dislike that with a helper we could
potentially have multiple notions of os_release.

I'll add a patch to refactor the use of os_release, but can we be
mindful that this is clear feature creep with little benefit? We will
still fall back on `perf_version_string` if uname fails and for all
practical purposes, `perf_version_string` will differ little from
uname in this case. I'm only going to add the patch because checking
other uses of os_release suggests the change is benign.

Thanks,
Ian

> Thanks,
> Namhyung
>
> > +
> > +             /* Before v6.10, s390 used psw_idle. */
> > +             if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
> > +                 major < 6 || (major == 6 && minor < 10)) {
> > +                     sym->idle = SYMBOL_IDLE__IDLE;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +     return false;
> >  }

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-06 16:11                 ` Ian Rogers
@ 2026-04-06 17:09                   ` Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                       ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v5: Add perf_env os_release helper (Namhyung/Sashiko)

v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
    https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
    https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (3):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c          |   6 +-
 tools/perf/util/data-convert-bt.c |   2 +-
 tools/perf/util/env.c             | 206 ++++++++++++++++++++++++------
 tools/perf/util/env.h             |   2 +
 tools/perf/util/session.c         |  14 +-
 tools/perf/util/symbol-elf.c      |   2 +-
 tools/perf/util/symbol.c          | 107 ++++++++++------
 tools/perf/util/symbol.h          |  15 ++-
 8 files changed, 264 insertions(+), 90 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-04-06 17:09                     ` Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/session.c |  14 +--
 3 files changed, 157 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..339d62ca37bb 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch) {
+		/*
+		 * Lazily compute/allocate arch. The e_machine may have been
+		 * read from a data file and so may not be EM_HOST.
+		 */
+		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	return normalize_arch(arch_name);
+		env->arch = strdup(e_machine_to_perf_arch(e_machine));
+	}
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a4501cbca375..91ff252712f4 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 3a911c70cd0e..070dd78772f2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3009,14 +3009,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-06 17:09                     ` Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/symbol.c          |  4 ++--
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index bece77cbc493..bc5805183100 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 339d62ca37bb..34b737950f73 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 91ff252712f4..bf30a02dccf7 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -174,6 +174,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index b4b30675688d..ea7d2f2dbcb7 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2208,7 +2208,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2225,7 +2225,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-04-06 17:09                     ` Ian Rogers
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 103 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ea7d2f2dbcb7..8c23802b39ad 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,85 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
+		    major < 6 || (major == 6 && minor < 10)) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +812,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..2f5f90f547aa 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                       ` (2 preceding siblings ...)
  2026-04-06 17:09                     ` [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-04-09 23:06                     ` Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                         ` (2 more replies)
  3 siblings, 3 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v6: Ensure arch is canonical by going to e_machine and back (Sashiko)

v5: Add perf_env os_release helper (Namhyung/Sashiko)
    https://lore.kernel.org/lkml/20260406170905.2614260-1-irogers@google.com/

v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
    https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
    https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (3):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c          |   6 +-
 tools/perf/util/data-convert-bt.c |   2 +-
 tools/perf/util/env.c             | 206 ++++++++++++++++++++++++------
 tools/perf/util/env.h             |   2 +
 tools/perf/util/header.c          |  60 ++++++---
 tools/perf/util/session.c         |  14 +-
 tools/perf/util/symbol-elf.c      |   2 +-
 tools/perf/util/symbol.c          | 107 ++++++++++------
 tools/perf/util/symbol.h          |  15 ++-
 9 files changed, 309 insertions(+), 105 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-04-09 23:06                       ` Ian Rogers
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 2 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/header.c  |  44 ++++++---
 tools/perf/util/session.c |  14 +--
 4 files changed, 191 insertions(+), 53 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..339d62ca37bb 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch) {
+		/*
+		 * Lazily compute/allocate arch. The e_machine may have been
+		 * read from a data file and so may not be EM_HOST.
+		 */
+		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	return normalize_arch(arch_name);
+		env->arch = strdup(e_machine_to_perf_arch(e_machine));
+	}
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..d36a0fb2cd04 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c6efddb70aee..9bb4a271b4f8 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,21 +370,25 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		arch = perf_env__arch(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.machine);
+	if (!arch) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2675,10 +2679,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	uint16_t saved_e_machine = ff->ph->env.e_machine;
+
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	/*
+	 * Make the arch string canonical by computing the e_machine from it,
+	 * then turning the e_machine back into an arch string.
+	 */
+	ff->ph->env.e_machine = EM_NONE;
+	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
+		zfree(&ff->ph->env.arch);
+		perf_env__arch(&ff->ph->env);
+	}
+	ff->ph->env.e_machine = saved_e_machine;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..726568b88803 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-09 23:06                       ` Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 339d62ca37bb..34b737950f73 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d36a0fb2cd04..56020f4381cd 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -175,6 +175,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 9bb4a271b4f8..89115134f1d2 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -361,13 +361,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..fd332db56157 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2209,7 +2209,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-04-09 23:06                       ` Ian Rogers
  2 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 103 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f6eb543de537..95fa3a03e62d 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fd332db56157..482fd47bead2 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -358,8 +359,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -367,17 +367,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -394,7 +383,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -555,7 +544,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -717,47 +706,85 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -786,7 +813,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..7e0036f80185 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -26,6 +26,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -43,6 +44,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -58,8 +65,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -194,8 +201,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -278,5 +284,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-01 18:20                         ` Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                             ` (3 more replies)
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  1 sibling, 4 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.
  
Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
perf top. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.
  
To avoid concurrent update issues with bitfields sharing a byte in
struct symbol due to the lazy computation, introduce a global lock for
updates to these fields and use setter functions. The reads remain
lockless.
  
v7:
 - Address better handling of strdup failures with arch in the header/env.
 - Address concurrent update issues in  struct symbol  bitfields by
   introducing a global lock for writes.
  
v6: Ensure arch is canonical by going to e_machine and back (Sashiko)
https://lore.kernel.org/linux-perf-users/20260409230620.4176210-1-irogers@google.com/

v5: Add perf_env os_release helper (Namhyung/Sashiko)
https://lore.kernel.org/lkml/20260406170905.2614260-1-irogers@google.com/
  
v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/
  
v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/
  
v2: Some minor white space clean up:
https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/
  
v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (4):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Add setters for bitfields sharing a byte to avoid
    concurrent update issues
  perf symbol: Lazily compute idle and use a global lock for updates

 tools/perf/builtin-kwork.c        |   2 +-
 tools/perf/builtin-sched.c        |   2 +-
 tools/perf/util/annotate.c        |   2 +-
 tools/perf/util/data-convert-bt.c |   2 +-
 tools/perf/util/env.c             | 218 +++++++++++++++++++++++++-----
 tools/perf/util/env.h             |   2 +
 tools/perf/util/header.c          |  63 +++++++--
 tools/perf/util/session.c         |  25 ++--
 tools/perf/util/symbol-elf.c      |   2 +-
 tools/perf/util/symbol.c          | 134 ++++++++++++------
 tools/perf/util/symbol.h          |  17 ++-
 11 files changed, 357 insertions(+), 112 deletions(-)

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 197 +++++++++++++++++++++++++++++++-------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/header.c  |  47 +++++++--
 tools/perf/util/session.c |  25 ++---
 4 files changed, 212 insertions(+), 58 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..1671769d4441 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,172 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, int is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/*
+	 * Handle conflicting prefixes. If the is_64_bit is unknown (-1) then
+	 * assume 64-bit. We can't use perf_env__kernel_is_64_bit as that
+	 * depends on the arch string.
+	 */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return (is_64_bit != 0) || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	uint16_t e_machine;
+	const char *arch;
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
+
+	if (env->arch)
+		return env->arch;
 
-	return normalize_arch(arch_name);
+	/*
+	 * Lazily compute/allocate arch. The e_machine may have been
+	 * read from a data file and so may not be EM_HOST.
+	 */
+	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	arch = e_machine_to_perf_arch(e_machine);
+	env->arch = strdup(arch);
+	/*
+	 * Avoid potential crashes on the arch string if memory allocation in
+	 * strdup fails and NULL were to be returned.
+	 */
+	return env->arch ?: arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..d36a0fb2cd04 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f30e48eb3fc3..8d5152bde25d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -379,21 +379,28 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session) {
+		/* Force the computation in the perf_env of the e_machine of the threads. */
+		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
+		arch = perf_env__arch(perf_session__env(evlist->session));
+	}
+
+	if (!arch) {
+		int ret = uname(&uts);
 
-	return do_write_string(ff, uts.machine);
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2684,10 +2691,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	uint16_t saved_e_machine = ff->ph->env.e_machine;
+
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	/*
+	 * Make the arch string canonical by computing the e_machine from it,
+	 * then turning the e_machine back into an arch string.
+	 */
+	ff->ph->env.e_machine = EM_NONE;
+	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
+		zfree(&ff->ph->env.arch);
+		perf_env__arch(&ff->ph->env);
+	}
+	ff->ph->env.e_machine = saved_e_machine;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..bc7add02a2de 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,19 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/*
+	 * Is the env caching an e_machine? If not we want to compute from the
+	 * more accurate threads.
+	 */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
@@ -3048,10 +3053,8 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 
 	/*
 	 * Couldn't determine from the perf_env or current set of
-	 * threads. Default to the host.
+	 * threads. Potentially use logic that uses the arch string otherwise
+	 * default to the host.
 	 */
-	if (e_flags)
-		*e_flags = EF_HOST;
-
-	return EM_HOST;
+	return perf_env__e_machine(env, e_flags);
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1671769d4441..c3e464c6de2f 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname or strdup fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release ?: perf_version_string;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d36a0fb2cd04..56020f4381cd 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -175,6 +175,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 8d5152bde25d..cfafed3cc69f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,13 +370,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..fd332db56157 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2209,7 +2209,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

A problem with putting bitfields into struct symbol is that other bits in
the symbol could be updated concurrently and only one update to the
underlying storage unit happen, leading to lost updates.

To avoid this, introduce a global lock `symbol_bits_lock` in `symbol.c`
and helper functions to update the bits sharing a byte:
`symbol__set_ignore` and `symbol__set_annotate2`.

`inlined` is not given a setter as it is only initialized in
`new_inline_sym` when the symbol is under construction and not shared.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-kwork.c |  2 +-
 tools/perf/builtin-sched.c |  2 +-
 tools/perf/util/annotate.c |  2 +-
 tools/perf/util/symbol.c   | 22 ++++++++++++++++++++++
 tools/perf/util/symbol.h   |  3 +++
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
index 9d3a4c779a41..7337ee956dc9 100644
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@@ -725,7 +725,7 @@ static void timehist_save_callchain(struct perf_kwork *kwork,
 		if (sym) {
 			if (!strcmp(sym->name, "__softirqentry_text_start") ||
 			    !strcmp(sym->name, "__do_softirq"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 555247568e7a..655e95f660c2 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2371,7 +2371,7 @@ static void save_task_callchain(struct perf_sched *sched,
 			if (!strcmp(sym->name, "schedule") ||
 			    !strcmp(sym->name, "__schedule") ||
 			    !strcmp(sym->name, "preempt_schedule"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..d550a0061159 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2224,7 +2224,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 
 	annotation__init_column_widths(notes, sym);
 	annotation__update_column_widths(notes);
-	sym->annotate2 = 1;
+	symbol__set_annotate2(sym, true);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fd332db56157..e6a1f23634ec 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -31,6 +31,7 @@
 #include "map.h"
 #include "symbol.h"
 #include "map_symbol.h"
+#include "mutex.h"
 #include "mem-events.h"
 #include "mem-info.h"
 #include "symsrc.h"
@@ -52,6 +53,8 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
 static bool symbol__is_idle(const char *name);
 
+static struct mutex symbol_bits_lock;
+
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
@@ -345,6 +348,20 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
+void symbol__set_ignore(struct symbol *sym, bool ignore)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->ignore = ignore;
+	mutex_unlock(&symbol_bits_lock);
+}
+
+void symbol__set_annotate2(struct symbol *sym, bool annotate2)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->annotate2 = annotate2;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -2398,6 +2415,8 @@ int symbol__init(struct perf_env *env)
 	if (symbol_conf.initialized)
 		return 0;
 
+	mutex_init(&symbol_bits_lock);
+
 	symbol_conf.priv_size = PERF_ALIGN(symbol_conf.priv_size, sizeof(u64));
 
 	symbol__elf_init();
@@ -2476,6 +2495,9 @@ void symbol__exit(void)
 {
 	if (!symbol_conf.initialized)
 		return;
+
+	mutex_destroy(&symbol_bits_lock);
+
 	strlist__delete(symbol_conf.bt_stop_list);
 	strlist__delete(symbol_conf.sym_list);
 	strlist__delete(symbol_conf.dso_list);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..5d98d7e84d57 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -77,6 +77,9 @@ struct symbol {
 void symbol__delete(struct symbol *sym);
 void symbols__delete(struct rb_root_cached *symbols);
 
+void symbol__set_ignore(struct symbol *sym, bool ignore);
+void symbol__set_annotate2(struct symbol *sym, bool annotate2);
+
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
  * @symbols: the rb_root of symbols
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (2 preceding siblings ...)
  2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  3 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

To avoid concurrent update issues with other bitfields in `struct symbol`,
this change uses the global lock `symbol_bits_lock` (introduced in a
previous commit) for updates to the `idle` field. A static helper
`symbol__set_idle` taking a boolean is used to encapsulate the lock and
mapping to `enum symbol_idle_kind`.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 108 +++++++++++++++++++++++------------
 tools/perf/util/symbol.h     |  14 +++--
 3 files changed, 81 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index e6a1f23634ec..8ec4b2836b44 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -51,7 +51,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 static struct mutex symbol_bits_lock;
 
@@ -362,6 +361,13 @@ void symbol__set_annotate2(struct symbol *sym, bool annotate2)
 	mutex_unlock(&symbol_bits_lock);
 }
 
+static void symbol__set_idle(struct symbol *sym, bool idle)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->idle = idle ? SYMBOL_IDLE__IDLE : SYMBOL_IDLE__NOT_IDLE;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -375,8 +381,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -384,17 +389,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -411,7 +405,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -572,7 +566,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -738,43 +732,81 @@ int modules__parse(const char *filename, void *arg,
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+static int sym_name_cmp(const void *a, const void *b)
 {
-	const char * const idle_symbols[] = {
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
+{
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		symbol__set_idle(sym, /*idle=*/false);
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
+
+	symbol__set_idle(sym, /*idle=*/false);
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -803,7 +835,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 5d98d7e84d57..717d2f876d58 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -43,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -58,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -197,8 +203,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -281,5 +286,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-02  6:59                         ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                             ` (16 more replies)
  1 sibling, 17 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is EM_NONE.
Derive the value from the arch string if available. Similarly derive
the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.
  
Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.
  
To avoid concurrent update issues with bitfields sharing a byte in
`struct symbol` due to the lazy computation, introduce a global lock
for updates to these fields and use setter functions. The reads remain
lockless.
  
v8:
 - Address Sashiko AI review feedback for Patch 1:
   - Switch all code dependent on the arch string to use `e_machine`
     instead (e.g., in `perf c2c`, `perf lock-contention`, `perf
     header`, `perf arch common`, `tests/topology.c`,
     `perf_env__init_kernel_mode`).
   - Update `machine__is` and `machine__normalized_is` to take
     `e_machine` integers instead of strings.
   - Refactor `arch_syscalls__strerrno_function` (generated via
     `arch_errno_names.sh`) to take an `e_machine` instead of an arch
     string.
   - Avoid premature caching of the host architecture in
     `perf_session__e_machine` by using a non-caching helper when
     threads are not yet available.
  
v7:
 - Address better handling of strdup failures with arch in the
   header/env.
 - Address concurrent update issues in `struct symbol` bitfields by
   introducing a global lock for writes.
https://lore.kernel.org/linux-perf-users/20260501182021.3651851-1-irogers@google.com/

v6: Ensure arch is canonical by going to e_machine and back (Sashiko)
https://lore.kernel.org/linux-perf-users/20260409230620.4176210-1-irogers@google.com/
  
v5: Add perf_env os_release helper (Namhyung/Sashiko)
https://lore.kernel.org/lkml/20260406170905.2614260-1-irogers@google.com/
  
v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/
  
v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/
  
v2: Some minor white space clean up:
https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/
  
v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (17):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf tests topology: Switch env->arch use to env->e_machine
  perf capstone: Determine architecture from e_machine
  perf print_insn: Use e_machine for fallback IP length check
  perf machine: Use perf_env e_machine rather than arch
  perf sample-raw: Use perf_env e_machine rather than arch
  perf sort: Use perf_env e_machine rather than arch
  perf symbol: Avoid use of machine__is
  perf arch common: Use perf_env e_machine rather than arch
  perf header: In print_pmu_caps use perf_env e_machine
  perf c2c: Use perf_env e_machine rather than arch
  perf lock-contention: Use perf_env e_machine rather than arch
  perf env: Refactor perf_env__arch_strerrno
  perf env: Remove unused perf_env__raw_arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Add setters for bitfields sharing a byte to avoid
    concurrent update issues
  perf symbol: Lazily compute idle and use a global lock for updates

 tools/perf/arch/common.c                    |  55 ++--
 tools/perf/builtin-c2c.c                    |   2 +-
 tools/perf/builtin-kwork.c                  |   2 +-
 tools/perf/builtin-sched.c                  |   2 +-
 tools/perf/builtin-trace.c                  |   5 +-
 tools/perf/tests/topology.c                 |   8 +-
 tools/perf/trace/beauty/arch_errno_names.sh |  40 ++-
 tools/perf/util/annotate.c                  |   2 +-
 tools/perf/util/capstone.c                  | 115 +++++---
 tools/perf/util/data-convert-bt.c           |   2 +-
 tools/perf/util/env.c                       | 283 +++++++++++++++-----
 tools/perf/util/env.h                       |  11 +-
 tools/perf/util/header.c                    |  70 +++--
 tools/perf/util/lock-contention.c           |   6 +-
 tools/perf/util/machine.c                   |  25 +-
 tools/perf/util/machine.h                   |   2 -
 tools/perf/util/print_insn.c                |   8 +-
 tools/perf/util/sample-raw.c                |  18 +-
 tools/perf/util/session.c                   |  26 +-
 tools/perf/util/sort.c                      |  12 +-
 tools/perf/util/symbol-elf.c                |   2 +-
 tools/perf/util/symbol.c                    | 163 +++++++----
 tools/perf/util/symbol.h                    |  17 +-
 23 files changed, 612 insertions(+), 264 deletions(-)

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
                                             ` (15 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back to
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 231 +++++++++++++++++++++++++++++++-------
 tools/perf/util/env.h     |   2 +
 tools/perf/util/header.c  |  47 ++++++--
 tools/perf/util/session.c |  26 +++--
 4 files changed, 243 insertions(+), 63 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..4ff4caab3b32 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -309,12 +311,21 @@ void perf_env__init(struct perf_env *env)
 
 static void perf_env__init_kernel_mode(struct perf_env *env)
 {
-	const char *arch = perf_env__raw_arch(env);
+	uint16_t e_machine = env->e_machine;
 
-	if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
-	    !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
-	    !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
-	    !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
+	if (env->arch && (e_machine == EM_NONE || e_machine == EM_MIPS || e_machine == EM_RISCV)) {
+		if (str_ends_with(env->arch, "64") || !strncmp(env->arch, "s390x", 5))
+			env->kernel_is_64_bit = 1;
+		else
+			env->kernel_is_64_bit = 0;
+		return;
+	}
+	if (e_machine == EM_NONE)
+		e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (e_machine == EM_X86_64 || e_machine == EM_AARCH64 ||
+	    e_machine == EM_PPC64 || e_machine == EM_SPARCV9 ||
+	    e_machine == EM_S390)
 		env->kernel_is_64_bit = 1;
 	else
 		env->kernel_is_64_bit = 0;
@@ -588,51 +599,187 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, int is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/*
+	 * Handle conflicting prefixes. If the is_64_bit is unknown (-1) then
+	 * assume 64-bit. We can't use perf_env__kernel_is_64_bit as that
+	 * depends on the arch string.
+	 */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return (is_64_bit != 0) || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags)
+{
+	uint16_t e_machine = EM_HOST;
+
+	if (env)
+		e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+	if (e_flags && e_machine == EM_HOST)
+		*e_flags = EF_HOST;
+
+	return e_machine;
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	uint16_t e_machine;
+	uint32_t local_e_flags;
+
+	if (env && env->e_machine != EM_NONE) {
+		if (e_flags)
+			*e_flags = env->e_flags;
+
+		return env->e_machine;
+	}
+	e_machine = perf_env__e_machine_nocache(env, &local_e_flags);
+	if (env) {
+		env->e_machine = e_machine;
+		env->e_flags = local_e_flags;
+	}
+	if (e_flags)
+		*e_flags = local_e_flags;
+
+	return e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	uint16_t e_machine;
+	const char *arch;
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
+
+	if (env->arch)
+		return env->arch;
 
-	return normalize_arch(arch_name);
+	/*
+	 * Lazily compute/allocate arch. The e_machine may have been
+	 * read from a data file and so may not be EM_HOST.
+	 */
+	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	arch = e_machine_to_perf_arch(e_machine);
+	env->arch = strdup(arch);
+	/*
+	 * Avoid potential crashes on the arch string if memory allocation in
+	 * strdup fails and NULL were to be returned.
+	 */
+	return env->arch ?: arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..7151a9138e3f 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,8 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags);
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f30e48eb3fc3..8d5152bde25d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -379,21 +379,28 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session) {
+		/* Force the computation in the perf_env of the e_machine of the threads. */
+		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
+		arch = perf_env__arch(perf_session__env(evlist->session));
+	}
+
+	if (!arch) {
+		int ret = uname(&uts);
 
-	return do_write_string(ff, uts.machine);
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2684,10 +2691,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	uint16_t saved_e_machine = ff->ph->env.e_machine;
+
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	/*
+	 * Make the arch string canonical by computing the e_machine from it,
+	 * then turning the e_machine back into an arch string.
+	 */
+	ff->ph->env.e_machine = EM_NONE;
+	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
+		zfree(&ff->ph->env.arch);
+		perf_env__arch(&ff->ph->env);
+	}
+	ff->ph->env.e_machine = saved_e_machine;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..3e64db2d27c2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,19 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/*
+	 * Is the env caching an e_machine? If not we want to compute from the
+	 * more accurate threads.
+	 */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
@@ -3048,10 +3053,9 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 
 	/*
 	 * Couldn't determine from the perf_env or current set of
-	 * threads. Default to the host.
+	 * threads. Potentially use logic that uses the arch string otherwise
+	 * default to the host. Don't cache in the perf_env in case later
+	 * threads indicate a better ELF machine type.
 	 */
-	if (e_flags)
-		*e_flags = EF_HOST;
-
-	return EM_HOST;
+	return perf_env__e_machine_nocache(env, e_flags);
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
                                             ` (14 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Some arch string comparisons weren't normalized. Avoid potential
issues with normalized names vs uname values by swtiching to using the
e_machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/topology.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index f54502ebef4b..d4c5c330c679 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -11,6 +11,7 @@
 #include "pmus.h"
 #include "target.h"
 #include <linux/err.h>
+#include <elf.h>
 
 #define TEMPL "/tmp/perf-test-XXXXXX"
 #define DATA_SIZE	10
@@ -74,6 +75,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	struct aggr_cpu_id id;
 	struct perf_cpu cpu;
 	struct perf_env *env;
+	uint16_t e_machine;
 
 	session = perf_session__new(&data, NULL);
 	TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
@@ -101,7 +103,9 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	 *  condition is true (see do_core_id_test in header.c). So always
 	 *  run this test on those platforms.
 	 */
-	if (!env->cpu && strncmp(env->arch, "s390", 4) && strncmp(env->arch, "aarch64", 7))
+	e_machine = perf_env__e_machine(env, NULL);
+
+	if (!env->cpu && e_machine != EM_S390 && e_machine != EM_AARCH64)
 		return TEST_SKIP;
 
 	/*
@@ -110,7 +114,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	 * physical_package_id will be set to -1. Hence skip this
 	 * test if physical_package_id returns -1 for cpu from perf_cpu_map.
 	 */
-	if (!strncmp(env->arch, "ppc64le", 7)) {
+	if (e_machine == EM_PPC64) {
 		if (cpu__get_socket_id(perf_cpu_map__cpu(map, 0)) == -1)
 			return TEST_SKIP;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 03/17] perf capstone: Determine architecture from e_machine
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
                                             ` (13 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Avoid the use of arch string that is imprecise and use the
e_machine. Do more e_machine to capstone machine translations adding
MIPS and RISCV. Remove unnecessary maybe_unused annotations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/capstone.c | 115 +++++++++++++++++++++++++------------
 1 file changed, 79 insertions(+), 36 deletions(-)

diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
index 25cf6e15ec27..e6226b751c36 100644
--- a/tools/perf/util/capstone.c
+++ b/tools/perf/util/capstone.c
@@ -16,6 +16,7 @@
 #include <fcntl.h>
 #include <inttypes.h>
 #include <string.h>
+#include <elf.h>
 
 #include <capstone/capstone.h>
 
@@ -137,37 +138,74 @@ static enum cs_err perf_cs_close(csh *handle)
 #endif
 }
 
-static int capstone_init(struct machine *machine, csh *cs_handle, bool is64,
-			 bool disassembler_style)
+static bool e_machine_to_capstone(uint16_t e_machine, bool is64,
+				  enum cs_arch *arch, enum cs_mode *mode)
+{
+        switch (e_machine) {
+        case EM_X86_64:
+		*arch = CS_ARCH_X86;
+		*mode = CS_MODE_64;
+		return true;
+        case EM_386:
+		*arch = CS_ARCH_X86;
+		*mode = CS_MODE_32;
+		return true;
+        case EM_AARCH64:
+		*arch = CS_ARCH_ARM64;
+		*mode = CS_MODE_ARM;
+		return true;
+        case EM_ARM:
+		*arch = CS_ARCH_ARM;
+		*mode = CS_MODE_ARM | CS_MODE_V8;
+		return true;
+        case EM_S390:
+		*arch = CS_ARCH_SYSZ;
+		*mode = CS_MODE_BIG_ENDIAN;
+		return true;
+        case EM_MIPS:
+		*arch = CS_ARCH_MIPS;
+		*mode = is64 ? CS_MODE_MIPS64 : CS_MODE_MIPS32;
+		*mode |= CS_MODE_BIG_ENDIAN;
+		return true;
+        case EM_PPC:
+		*arch = CS_ARCH_PPC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_32;
+		return true;
+        case EM_PPC64:
+		*arch = CS_ARCH_PPC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_64;
+		return true;
+        case EM_SPARC:
+		*arch = CS_ARCH_SPARC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_32;
+		return true;
+        case EM_SPARCV9:
+		*arch = CS_ARCH_SPARC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_V9 | CS_MODE_64;
+		return true;
+        case EM_RISCV:
+		*arch = CS_ARCH_RISCV;
+		*mode = is64 ? CS_MODE_RISCV64 : CS_MODE_RISCV32;
+		return true;
+        default:
+		return false;
+        }
+}
+
+static int capstone_init(uint16_t e_machine, csh *cs_handle, bool is64, bool disassembler_style)
 {
 	enum cs_arch arch;
 	enum cs_mode mode;
 
-	if (machine__is(machine, "x86_64") && is64) {
-		arch = CS_ARCH_X86;
-		mode = CS_MODE_64;
-	} else if (machine__normalized_is(machine, "x86")) {
-		arch = CS_ARCH_X86;
-		mode = CS_MODE_32;
-	} else if (machine__normalized_is(machine, "arm64")) {
-		arch = CS_ARCH_ARM64;
-		mode = CS_MODE_ARM;
-	} else if (machine__normalized_is(machine, "arm")) {
-		arch = CS_ARCH_ARM;
-		mode = CS_MODE_ARM + CS_MODE_V8;
-	} else if (machine__normalized_is(machine, "s390")) {
-		arch = CS_ARCH_SYSZ;
-		mode = CS_MODE_BIG_ENDIAN;
-	} else {
+	if (!e_machine_to_capstone(e_machine, is64, &arch, &mode))
 		return -1;
-	}
 
 	if (perf_cs_open(arch, mode, cs_handle) != CS_ERR_OK) {
 		pr_warning_once("cs_open failed\n");
 		return -1;
 	}
 
-	if (machine__normalized_is(machine, "x86")) {
+	if (arch == CS_ARCH_X86) {
 		/*
 		 * In case of using capstone_init while symbol__disassemble
 		 * setting CS_OPT_SYNTAX_ATT depends if disassembler_style opts
@@ -212,28 +250,31 @@ static size_t print_insn_x86(struct thread *thread, u8 cpumode, struct cs_insn *
 }
 
 
-ssize_t capstone__fprintf_insn_asm(struct machine *machine __maybe_unused,
-				   struct thread *thread __maybe_unused,
-				   u8 cpumode __maybe_unused, bool is64bit __maybe_unused,
-				   const uint8_t *code __maybe_unused,
-				   size_t code_size __maybe_unused,
-				   uint64_t ip __maybe_unused, int *lenp __maybe_unused,
-				   int print_opts __maybe_unused, FILE *fp __maybe_unused)
+ssize_t capstone__fprintf_insn_asm(struct machine *machine,
+				   struct thread *thread,
+				   u8 cpumode,
+				   bool is64bit,
+				   const uint8_t *code,
+				   size_t code_size,
+				   uint64_t ip, int *lenp,
+				   int print_opts,
+				   FILE *fp)
 {
 	size_t printed;
 	struct cs_insn *insn;
 	csh cs_handle;
 	size_t count;
+	uint16_t e_machine = thread__e_machine(thread, machine, /*e_flags=*/NULL);
 	int ret;
 
 	/* TODO: Try to initiate capstone only once but need a proper place. */
-	ret = capstone_init(machine, &cs_handle, is64bit, true);
+	ret = capstone_init(e_machine, &cs_handle, is64bit, /*disassembler_style=*/true);
 	if (ret < 0)
 		return ret;
 
 	count = perf_cs_disasm(cs_handle, code, code_size, ip, 1, &insn);
 	if (count > 0) {
-		if (machine__normalized_is(machine, "x86"))
+		if (e_machine == EM_X86_64 || e_machine == EM_386)
 			printed = print_insn_x86(thread, cpumode, &insn[0], print_opts, fp);
 		else
 			printed = fprintf(fp, "%s %s", insn[0].mnemonic, insn[0].op_str);
@@ -322,9 +363,9 @@ static int find_file_offset(u64 start, u64 len, u64 pgoff, void *arg)
 	return 0;
 }
 
-int symbol__disassemble_capstone(const char *filename __maybe_unused,
-				 struct symbol *sym __maybe_unused,
-				 struct annotate_args *args __maybe_unused)
+int symbol__disassemble_capstone(const char *filename,
+				 struct symbol *sym,
+				 struct annotate_args *args)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct map *map = args->ms->map;
@@ -344,6 +385,7 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
 	char disasm_buf[512];
 	struct disasm_line *dl;
 	bool disassembler_style = false;
+	uint16_t e_machine;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -373,8 +415,8 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
 	    !strcmp(args->options->disassembler_style, "att"))
 		disassembler_style = true;
 
-	if (capstone_init(maps__machine(thread__maps(args->ms->thread)), &handle, is_64bit,
-			  disassembler_style) < 0)
+	e_machine = thread__e_machine(args->ms->thread, /*machine=*/NULL, /*e_flags=*/NULL);
+	if (capstone_init(e_machine, &handle, is_64bit, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
@@ -466,6 +508,7 @@ int symbol__disassemble_capstone_powerpc(const char *filename __maybe_unused,
 	struct disasm_line *dl;
 	u32 *line;
 	bool disassembler_style = false;
+	uint16_t e_machine;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -484,8 +527,8 @@ int symbol__disassemble_capstone_powerpc(const char *filename __maybe_unused,
 	    !strcmp(args->options->disassembler_style, "att"))
 		disassembler_style = true;
 
-	if (capstone_init(maps__machine(thread__maps(args->ms->thread)), &handle, is_64bit,
-			  disassembler_style) < 0)
+	e_machine = thread__e_machine(args->ms->thread, /*machine=*/NULL, /*e_flags=*/NULL);
+	if (capstone_init(e_machine, &handle, is_64bit, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (2 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
                                             ` (12 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Avoid string comparisons with perf_env arch, switch to using the more
precise ELF machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/print_insn.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index 02e6fbb8ca04..5e36344174d6 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -9,6 +9,7 @@
 #include <stdbool.h>
 #include "capstone.h"
 #include "debug.h"
+#include "env.h"
 #include "sample.h"
 #include "symbol.h"
 #include "machine.h"
@@ -17,6 +18,7 @@
 #include "dump-insn.h"
 #include "map.h"
 #include "dso.h"
+#include <elf.h>
 
 size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 {
@@ -33,13 +35,13 @@ size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 static bool is64bitip(struct machine *machine, struct addr_location *al)
 {
 	const struct dso *dso = al->map ? map__dso(al->map) : NULL;
+	uint16_t e_machine;
 
 	if (dso)
 		return dso__is_64_bit(dso);
 
-	return machine__is(machine, "x86_64") ||
-		machine__normalized_is(machine, "arm64") ||
-		machine__normalized_is(machine, "s390");
+	e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);
+	return e_machine == EM_X86_64 || e_machine == EM_AARCH64 || e_machine == EM_S390;
 }
 
 ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpumode,
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (3 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 06/17] perf sample-raw: " Ian Rogers
                                             ` (11 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

The arch string is derived from uname and may be normalized causing
potential differences meaning the ELF machine can be more
precise. Reduce the scope of machine__is as often it is better to use
a thread for the e_machine rather than the machine. Switch from string
to ELF machine constant comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 25 ++++++++-----------------
 tools/perf/util/machine.h |  2 --
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e76f8c86e62a..6d32d3cb5cb7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1611,10 +1611,15 @@ static bool machine__uses_kcore(struct machine *machine)
 	return dsos__for_each_dso(&machine->dsos, machine__uses_kcore_cb, NULL) != 0 ? true : false;
 }
 
+static bool machine__is(struct machine *machine, uint16_t e_machine)
+{
+	return machine && perf_env__e_machine(machine->env, NULL) == e_machine;
+}
+
 static bool perf_event__is_extra_kernel_mmap(struct machine *machine,
 					     struct extra_kernel_map *xm)
 {
-	return machine__is(machine, "x86_64") &&
+	return machine__is(machine, EM_X86_64) &&
 	       is_entry_trampoline(xm->name);
 }
 
@@ -2770,7 +2775,7 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 static u64 get_leaf_frame_caller(struct perf_sample *sample,
 		struct thread *thread, int usr_idx)
 {
-	if (machine__normalized_is(maps__machine(thread__maps(thread)), "arm64"))
+	if (thread__e_machine(thread, /*machine=*/NULL, /*e_flags=*/NULL) == EM_AARCH64)
 		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
 	else
 		return 0;
@@ -3141,20 +3146,6 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 	return 0;
 }
 
-/*
- * Compares the raw arch string. N.B. see instead perf_env__arch() or
- * machine__normalized_is() if a normalized arch is needed.
- */
-bool machine__is(struct machine *machine, const char *arch)
-{
-	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
-}
-
-bool machine__normalized_is(struct machine *machine, const char *arch)
-{
-	return machine && !strcmp(perf_env__arch(machine->env), arch);
-}
-
 int machine__nr_cpus_avail(struct machine *machine)
 {
 	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
@@ -3181,7 +3172,7 @@ int machine__get_kernel_start(struct machine *machine)
 		 * start of kernel text, but still above 2^63. So leave
 		 * kernel_start = 1ULL << 63 for x86_64.
 		 */
-		if (!err && !machine__is(machine, "x86_64"))
+		if (!err && !machine__is(machine, EM_X86_64))
 			machine->kernel_start = map__start(map);
 	}
 	return err;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 22a42c5825fa..003c970b3e4b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -227,8 +227,6 @@ static inline bool machine__is_host(struct machine *machine)
 }
 
 bool machine__is_lock_function(struct machine *machine, u64 addr);
-bool machine__is(struct machine *machine, const char *arch);
-bool machine__normalized_is(struct machine *machine, const char *arch);
 int machine__nr_cpus_avail(struct machine *machine);
 
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 06/17] perf sample-raw: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (4 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 07/17] perf sort: " Ian Rogers
                                             ` (10 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than the arch to determine S390 and x86 types.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/sample-raw.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/sample-raw.c b/tools/perf/util/sample-raw.c
index bcf442574d6e..b10056ac8057 100644
--- a/tools/perf/util/sample-raw.c
+++ b/tools/perf/util/sample-raw.c
@@ -1,6 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-
-#include <string.h>
+#include <elf.h>
 #include <linux/string.h>
 #include "evlist.h"
 #include "env.h"
@@ -14,14 +13,15 @@
  */
 void evlist__init_trace_event_sample_raw(struct evlist *evlist, struct perf_env *env)
 {
-	const char *arch_pf = perf_env__arch(env);
-	const char *cpuid = perf_env__cpuid(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (arch_pf && !strcmp("s390", arch_pf))
+	if (e_machine == EM_S390) {
 		evlist->trace_event_sample_raw = evlist__s390_sample_raw;
-	else if (arch_pf && !strcmp("x86", arch_pf) &&
-		 cpuid && strstarts(cpuid, "AuthenticAMD") &&
-		 evlist__has_amd_ibs(evlist)) {
-		evlist->trace_event_sample_raw = evlist__amd_sample_raw;
+	} else if (e_machine == EM_X86_64 || e_machine == EM_386) {
+		const char *cpuid = perf_env__cpuid(env);
+
+		if (cpuid && strstarts(cpuid, "AuthenticAMD") &&
+		    evlist__has_amd_ibs(evlist))
+			evlist->trace_event_sample_raw = evlist__amd_sample_raw;
 	}
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 07/17] perf sort: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (5 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 06/17] perf sample-raw: " Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
                                             ` (9 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than the arch to determine x86 or PPC types.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/sort.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 0020089cb13c..06a641cf49e3 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <ctype.h>
+#include <elf.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <regex.h>
@@ -2673,9 +2674,10 @@ struct sort_dimension {
 
 static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 {
-	const char *arch = perf_env__arch(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
+	if (e_machine == EM_X86_64 || e_machine == EM_386 ||
+	    e_machine == EM_PPC64 || e_machine == EM_PPC) {
 		if (!strcmp(sort_key, "p_stage_cyc"))
 			return 1;
 		if (!strcmp(sort_key, "local_p_stage_cyc"))
@@ -2686,14 +2688,14 @@ static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 
 static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
 {
-	const char *arch = perf_env__arch(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (!strcmp("x86", arch)) {
+	if (e_machine == EM_X86_64 || e_machine == EM_386) {
 		if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
 			return "Local Retire Latency";
 		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
 			return "Retire Latency";
-	} else if (!strcmp("powerpc", arch)) {
+	} else if (e_machine == EM_PPC64 || e_machine == EM_PPC) {
 		if (!strcmp(se_header, "Local INSTR Latency"))
 			return "Finish Cyc";
 		else if (!strcmp(se_header, "INSTR Latency"))
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 08/17] perf symbol: Avoid use of machine__is
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (6 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 07/17] perf sort: " Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
                                             ` (8 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch to using the ELF machine from the dso or running machine rather
than the machine perf_env arch that may fall back on EM_HOST. This
also avoids potentially imprecise string comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..8aaaab0ad4b7 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -851,6 +851,24 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 	return count;
 }
 
+static uint16_t machine_or_dso_e_machine(struct machine *machine, struct dso *dso)
+{
+	uint16_t e_machine = EM_NONE;
+
+	/* Check for a cached value first. */
+	if (machine && machine->env && machine->env->e_machine != EM_NONE)
+		return machine->env->e_machine;
+
+	/* DSO should be most accurate */
+	if (dso)
+		e_machine = dso__e_machine(dso, machine, /*e_flags=*/NULL);
+
+	if (e_machine != EM_NONE)
+		return e_machine;
+
+	return perf_env__e_machine(machine ? machine->env : NULL, /*e_flags=*/NULL);
+}
+
 /*
  * Split the symbols into maps, making sure there are no overlaps, i.e. the
  * kernel range is broken in several maps, named [kernel].N, as we don't have
@@ -866,14 +884,13 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 	struct rb_root_cached *root = dso__symbols(dso);
 	struct rb_node *next = rb_first_cached(root);
 	int kernel_range = 0;
-	bool x86_64;
+	uint16_t e_machine = EM_NONE;
 
 	if (!kmaps)
 		return -1;
 
 	machine = maps__machine(kmaps);
-
-	x86_64 = machine__is(machine, "x86_64");
+	e_machine = machine_or_dso_e_machine(machine, dso);
 
 	while (next) {
 		char *module;
@@ -925,7 +942,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			 */
 			pos->start = map__map_ip(curr_map, pos->start);
 			pos->end   = map__map_ip(curr_map, pos->end);
-		} else if (x86_64 && is_entry_trampoline(pos->name)) {
+		} else if (e_machine == EM_X86_64 && is_entry_trampoline(pos->name)) {
 			/*
 			 * These symbols are not needed anymore since the
 			 * trampoline maps refer to the text section and it's
@@ -1428,7 +1445,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 		free(new_node);
 	}
 
-	if (machine__is(machine, "x86_64")) {
+	if (machine_or_dso_e_machine(machine, dso) == EM_X86_64) {
 		u64 addr;
 
 		/*
@@ -1716,7 +1733,7 @@ int dso__load(struct dso *dso, struct map *map)
 			ret = dso__load_guest_kernel_sym(dso, map);
 
 		machine = maps__machine(map__kmaps(map));
-		if (machine__is(machine, "x86_64"))
+		if (machine_or_dso_e_machine(machine, dso) == EM_X86_64)
 			machine__map_x86_64_entry_trampolines(machine, dso);
 		goto out;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (7 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
                                             ` (7 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/common.c | 55 +++++++++++++++++++++++++---------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index 21836f70f231..e9b5b61feffe 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -1,12 +1,14 @@
 // SPDX-License-Identifier: GPL-2.0
+#include "common.h"
+
 #include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#include "common.h"
 #include "../util/env.h"
 #include "../util/debug.h"
+#include <dwarf-regs.h>
 #include <linux/zalloc.h>
 
 static const char *const arc_triplets[] = {
@@ -145,7 +147,8 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 					  const char *name, char **path)
 {
 	int idx;
-	const char *arch = perf_env__arch(env), *cross_env;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	const char *cross_env;
 	const char *const *path_list;
 	char *buf = NULL;
 
@@ -153,7 +156,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 	 * We don't need to try to find objdump path for native system.
 	 * Just use default binutils path (e.g.: "objdump").
 	 */
-	if (!strcmp(perf_env__arch(NULL), arch))
+	if (e_machine == EM_HOST)
 		goto out;
 
 	cross_env = getenv("CROSS_COMPILE");
@@ -170,30 +173,42 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 		zfree(&buf);
 	}
 
-	if (!strcmp(arch, "arc"))
+	switch(e_machine) {
+	case EM_ARC:
 		path_list = arc_triplets;
-	else if (!strcmp(arch, "arm"))
+		break;
+	case EM_ARM:
 		path_list = arm_triplets;
-	else if (!strcmp(arch, "arm64"))
+		break;
+	case EM_AARCH64:
 		path_list = arm64_triplets;
-	else if (!strcmp(arch, "powerpc"))
+		break;
+	case EM_PPC:
+	case EM_PPC64:
 		path_list = powerpc_triplets;
-	else if (!strcmp(arch, "riscv32"))
-		path_list = riscv32_triplets;
-	else if (!strcmp(arch, "riscv64"))
-		path_list = riscv64_triplets;
-	else if (!strcmp(arch, "sh"))
+		break;
+	case EM_RISCV:
+		path_list = perf_env__kernel_is_64_bit(env) ? riscv64_triplets : riscv32_triplets;
+		break;
+	case EM_SH:
 		path_list = sh_triplets;
-	else if (!strcmp(arch, "s390"))
+		break;
+	case EM_S390:
 		path_list = s390_triplets;
-	else if (!strcmp(arch, "sparc"))
+		break;
+	case EM_SPARC:
+	case EM_SPARCV9:
 		path_list = sparc_triplets;
-	else if (!strcmp(arch, "x86"))
+		break;
+	case EM_X86_64:
+	case EM_386:
 		path_list = x86_triplets;
-	else if (!strcmp(arch, "mips"))
+		break;
+	case EM_MIPS:
 		path_list = mips_triplets;
-	else {
-		ui__error("binutils for %s not supported.\n", arch);
+		break;
+	default:
+		ui__error("binutils for %s not supported.\n", perf_env__arch(env));
 		goto out_error;
 	}
 
@@ -202,7 +217,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 		ui__error("Please install %s for %s.\n"
 			  "You can add it to PATH, set CROSS_COMPILE or "
 			  "override the default using --%s.\n",
-			  name, arch, name);
+			  name, perf_env__arch(env), name);
 		goto out_error;
 	}
 
@@ -237,5 +252,5 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
  */
 bool perf_env__single_address_space(struct perf_env *env)
 {
-	return strcmp(perf_env__arch(env), "sparc");
+	return perf_env__e_machine(env, /*e_flags=*/NULL) == EM_SPARC;
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (8 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
                                             ` (6 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch from arch to e_machine in print_pmu_caps.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/header.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 8d5152bde25d..c6436269df4b 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2348,15 +2348,16 @@ static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
 static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
 {
 	struct perf_env *env = &ff->ph->env;
-	struct pmu_caps *pmu_caps;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
 	for (int i = 0; i < env->nr_pmus_with_caps; i++) {
-		pmu_caps = &env->pmu_caps[i];
+		struct pmu_caps *pmu_caps = &env->pmu_caps[i];
+
 		__print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps,
 				 pmu_caps->pmu_name);
 	}
 
-	if (strcmp(perf_env__arch(env), "x86") == 0 &&
+	if ((e_machine == EM_X86_64 || e_machine == EM_386) &&
 	    perf_env__has_pmu_mapping(env, "ibs_op")) {
 		char *max_precise = perf_env__find_pmu_cap(env, "cpu", "max_precise");
 
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (9 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 12/17] perf lock-contention: " Ian Rogers
                                             ` (5 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching for AARCH64.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-c2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 72a7802775ee..09c8352a922c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3202,7 +3202,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	 * default display type.
 	 */
 	if (!display) {
-		if (!strcmp(perf_env__arch(env), "arm64"))
+		if (perf_env__e_machine(env, /*e_flags=*/NULL) == EM_AARCH64)
 			display = "peer";
 		else
 			display = "tot";
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 12/17] perf lock-contention: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (10 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno Ian Rogers
                                             ` (4 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching for powerpc.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/lock-contention.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/lock-contention.c b/tools/perf/util/lock-contention.c
index 92e7b7b572a2..119a7206f3cd 100644
--- a/tools/perf/util/lock-contention.c
+++ b/tools/perf/util/lock-contention.c
@@ -104,7 +104,8 @@ bool match_callstack_filter(struct machine *machine, u64 *callstack, int max_sta
 	struct map *kmap;
 	struct symbol *sym;
 	u64 ip;
-	const char *arch = perf_env__arch(machine->env);
+	uint16_t e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);
+	bool is_powerpc = e_machine == EM_PPC64 || e_machine == EM_PPC;
 
 	if (list_empty(&callstack_filters))
 		return true;
@@ -125,8 +126,7 @@ bool match_callstack_filter(struct machine *machine, u64 *callstack, int max_sta
 		 * incase first or second callstack index entry has 0
 		 * address for powerpc.
 		 */
-		if (!callstack || (!callstack[i] && (strcmp(arch, "powerpc") ||
-						(i != 1 && i != 2))))
+		if (!callstack || (!callstack[i] && (!is_powerpc || (i != 1 && i != 2))))
 			break;
 
 		ip = callstack[i];
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (11 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 12/17] perf lock-contention: " Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch Ian Rogers
                                             ` (3 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

perf_env__arch_strerrno is only available with libtraceevent so hide
the declaration if no libtraceevent.

The previous approach maps an architecture string to a function
pointer to a function that takes an int errno values and returns a
string. The new approach takes an e_machine and an errno value and
returns a string.

As the only call site is in builtin-trace.c, the e_machine is already
present and potentially more specific than the perf_env arch string
that is a single global value.

The major complication in this approach is having the shell script
that generates the C code map a linux directory name to the matching
ELF machine constants.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-trace.c                  |  5 ++-
 tools/perf/trace/beauty/arch_errno_names.sh | 40 ++++++++++++++++++---
 tools/perf/util/env.c                       | 13 +++----
 tools/perf/util/env.h                       |  7 ++--
 4 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e58c49d047a2..d1f21b5e7c98 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3008,9 +3008,8 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
 	} else if (ret < 0) {
 errno_print: {
 		char bf[STRERR_BUFSIZE];
-		struct perf_env *env = evsel__env(evsel) ?: &trace->host_env;
 		const char *emsg = str_error_r(-ret, bf, sizeof(bf));
-		const char *e = perf_env__arch_strerrno(env, err);
+		const char *e = perf_env__arch_strerrno(e_machine, err);
 
 		fprintf(trace->output, "-1 %s (%s)", e, emsg);
 	}
@@ -4890,7 +4889,7 @@ static size_t syscall__dump_stats(struct trace *trace, int e_machine, FILE *fp,
 
 				for (e = 0; e < stats->max_errno; ++e) {
 					if (stats->errnos[e] != 0)
-						fprintf(fp, "\t\t\t\t%s: %d\n", perf_env__arch_strerrno(trace->host->env, e + 1), stats->errnos[e]);
+						fprintf(fp, "\t\t\t\t%s: %d\n", perf_env__arch_strerrno(e_machine, e + 1), stats->errnos[e]);
 				}
 			}
 			lines++;
diff --git a/tools/perf/trace/beauty/arch_errno_names.sh b/tools/perf/trace/beauty/arch_errno_names.sh
index b22890b8d272..89b742927168 100755
--- a/tools/perf/trace/beauty/arch_errno_names.sh
+++ b/tools/perf/trace/beauty/arch_errno_names.sh
@@ -52,21 +52,49 @@ process_arch()
 		|IFS=, create_errno_lookup_func "$arch"
 }
 
+arch_to_e_machine()
+{
+	case "$1" in
+	alpha)      printf '\tcase EM_ALPHA:\n' ;;
+	arc)        printf '\tcase EM_ARC:\n' ;;
+	arm)        printf '\tcase EM_ARM:\n' ;;
+	arm64)      printf '\tcase EM_AARCH64:\n' ;;
+	csky)       printf '\tcase EM_CSKY:\n' ;;
+	hexagon)    printf '\tcase EM_HEXAGON:\n' ;;
+	loongarch)  printf '\tcase EM_LOONGARCH:\n' ;;
+	microblaze) printf '\tcase EM_MICROBLAZE:\n' ;;
+	mips)       printf '\tcase EM_MIPS:\n' ;;
+	parisc)     printf '\tcase EM_PARISC:\n' ;;
+	powerpc)    printf '\tcase EM_PPC:\n\tcase EM_PPC64:\n' ;;
+	riscv)      printf '\tcase EM_RISCV:\n' ;;
+	s390)       printf '\tcase EM_S390:\n' ;;
+	sh)         printf '\tcase EM_SH:\n' ;;
+	sparc)      printf '\tcase EM_SPARC:\n\tcase EM_SPARCV9:\n' ;;
+	x86)        printf '\tcase EM_386:\n\tcase EM_X86_64:\n' ;;
+	xtensa)     printf '\tcase EM_XTENSA:\n' ;;
+	esac
+}
+
 create_arch_errno_table_func()
 {
 	archlist="$1"
 	default="$2"
 
-	printf 'static arch_syscalls__strerrno_t *\n'
-	printf 'arch_syscalls__strerrno_function(const char *arch)\n'
+	printf 'static const char *\n'
+	printf 'arch_syscalls__strerrno(uint16_t e_machine, int err)\n'
 	printf '{\n'
+	printf '\tswitch (e_machine) {\n'
 	for arch in $archlist; do
 		arch_str=$(arch_string "$arch")
-		printf '\tif (!strcmp(arch, "%s"))\n' "$arch_str"
-		printf '\t\treturn errno_to_name__%s;\n' "$arch_str"
+		ems=$(arch_to_e_machine "$arch_str")
+		if [ -n "$ems" ]; then
+			printf '%s\n' "$ems"
+			printf '\t\treturn errno_to_name__%s(err);\n' "$arch_str"
+		fi
 	done
 	arch_str=$(arch_string "$default")
-	printf '\treturn errno_to_name__%s;\n' "$arch_str"
+	printf '\tdefault:\n\t\treturn errno_to_name__%s(err);\n' "$arch_str"
+	printf '\t}\n'
 	printf '}\n'
 }
 
@@ -74,6 +102,8 @@ cat <<EoHEADER
 /* SPDX-License-Identifier: GPL-2.0 */
 
 #include <string.h>
+#include <stdint.h>
+#include <elf.h>
 
 EoHEADER
 
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4ff4caab3b32..97f4aa1131a1 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -786,17 +786,12 @@ const char *perf_env__arch(struct perf_env *env)
 #include "trace/beauty/arch_errno_names.c"
 #endif
 
-const char *perf_env__arch_strerrno(struct perf_env *env __maybe_unused, int err __maybe_unused)
-{
 #if defined(HAVE_LIBTRACEEVENT)
-	if (env->arch_strerrno == NULL)
-		env->arch_strerrno = arch_syscalls__strerrno_function(perf_env__arch(env));
-
-	return env->arch_strerrno ? env->arch_strerrno(err) : "no arch specific strerrno function";
-#else
-	return "!HAVE_LIBTRACEEVENT";
-#endif
+const char *perf_env__arch_strerrno(uint16_t e_machine, int err)
+{
+	return arch_syscalls__strerrno(e_machine, err);
 }
+#endif
 
 const char *perf_env__cpuid(struct perf_env *env)
 {
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 7151a9138e3f..68dead1b36a6 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -67,8 +67,6 @@ struct cpu_domain_map {
 	struct domain_info	**domains;
 };
 
-typedef const char *(arch_syscalls__strerrno_t)(int err);
-
 struct perf_env {
 	char			*hostname;
 	char			*os_release;
@@ -158,7 +156,6 @@ struct perf_env {
 		 */
 		bool	enabled;
 	} clock;
-	arch_syscalls__strerrno_t *arch_strerrno;
 };
 
 enum perf_compress_type {
@@ -190,7 +187,9 @@ void cpu_cache_level__free(struct cpu_cache_level *cache);
 uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags);
 uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
-const char *perf_env__arch_strerrno(struct perf_env *env, int err);
+#if defined(HAVE_LIBTRACEEVENT)
+const char *perf_env__arch_strerrno(uint16_t e_machine, int err);
+#endif
 const char *perf_env__cpuid(struct perf_env *env);
 const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (12 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
                                             ` (2 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

The switch to using e_machine has made the perf_env__raw_arch function
unused so remove it.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c | 18 ------------------
 tools/perf/util/env.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 97f4aa1131a1..5944acd28996 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -451,19 +451,6 @@ int perf_env__read_cpuid(struct perf_env *env)
 	return 0;
 }
 
-static int perf_env__read_arch(struct perf_env *env)
-{
-	struct utsname uts;
-
-	if (env->arch)
-		return 0;
-
-	if (!uname(&uts))
-		env->arch = strdup(uts.machine);
-
-	return env->arch ? 0 : -ENOMEM;
-}
-
 static int perf_env__read_nr_cpus_avail(struct perf_env *env)
 {
 	if (env->nr_cpus_avail == 0)
@@ -582,11 +569,6 @@ int perf_env__read_core_pmu_caps(struct perf_env *env)
 	return ret;
 }
 
-const char *perf_env__raw_arch(struct perf_env *env)
-{
-	return env && !perf_env__read_arch(env) ? env->arch : "unknown";
-}
-
 int perf_env__nr_cpus_avail(struct perf_env *env)
 {
 	return env && !perf_env__read_nr_cpus_avail(env) ? env->nr_cpus_avail : 0;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 68dead1b36a6..a95fd7eb3524 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -191,7 +191,6 @@ const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(uint16_t e_machine, int err);
 #endif
 const char *perf_env__cpuid(struct perf_env *env);
-const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
 
 void perf_env__init(struct perf_env *env);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (13 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 5944acd28996..1090aaa2985f 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -339,6 +339,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname or strdup fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release ?: perf_version_string;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a95fd7eb3524..989545a47798 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -172,6 +172,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c6436269df4b..4867a932cb88 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,13 +370,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 8aaaab0ad4b7..a70066d17729 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2243,7 +2243,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (14 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

A problem with putting bitfields into struct symbol is that other bits in
the symbol could be updated concurrently and only one update to the
underlying storage unit happen, leading to lost updates.

To avoid this, introduce a global lock `symbol_bits_lock` in `symbol.c`
and helper functions to update the bits sharing a byte:
`symbol__set_ignore` and `symbol__set_annotate2`.

`inlined` is not given a setter as it is only initialized in
`new_inline_sym` when the symbol is under construction and not shared.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-kwork.c |  2 +-
 tools/perf/builtin-sched.c |  2 +-
 tools/perf/util/annotate.c |  2 +-
 tools/perf/util/symbol.c   | 22 ++++++++++++++++++++++
 tools/perf/util/symbol.h   |  3 +++
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
index 9d3a4c779a41..7337ee956dc9 100644
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@@ -725,7 +725,7 @@ static void timehist_save_callchain(struct perf_kwork *kwork,
 		if (sym) {
 			if (!strcmp(sym->name, "__softirqentry_text_start") ||
 			    !strcmp(sym->name, "__do_softirq"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 555247568e7a..655e95f660c2 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2371,7 +2371,7 @@ static void save_task_callchain(struct perf_sched *sched,
 			if (!strcmp(sym->name, "schedule") ||
 			    !strcmp(sym->name, "__schedule") ||
 			    !strcmp(sym->name, "preempt_schedule"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..d550a0061159 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2224,7 +2224,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 
 	annotation__init_column_widths(notes, sym);
 	annotation__update_column_widths(notes);
-	sym->annotate2 = 1;
+	symbol__set_annotate2(sym, true);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a70066d17729..1238a0d6ce6e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -31,6 +31,7 @@
 #include "map.h"
 #include "symbol.h"
 #include "map_symbol.h"
+#include "mutex.h"
 #include "mem-events.h"
 #include "mem-info.h"
 #include "symsrc.h"
@@ -52,6 +53,8 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
 static bool symbol__is_idle(const char *name);
 
+static struct mutex symbol_bits_lock;
+
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
@@ -345,6 +348,20 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
+void symbol__set_ignore(struct symbol *sym, bool ignore)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->ignore = ignore;
+	mutex_unlock(&symbol_bits_lock);
+}
+
+void symbol__set_annotate2(struct symbol *sym, bool annotate2)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->annotate2 = annotate2;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -2415,6 +2432,8 @@ int symbol__init(struct perf_env *env)
 	if (symbol_conf.initialized)
 		return 0;
 
+	mutex_init(&symbol_bits_lock);
+
 	symbol_conf.priv_size = PERF_ALIGN(symbol_conf.priv_size, sizeof(u64));
 
 	symbol__elf_init();
@@ -2493,6 +2512,9 @@ void symbol__exit(void)
 {
 	if (!symbol_conf.initialized)
 		return;
+
+	mutex_destroy(&symbol_bits_lock);
+
 	strlist__delete(symbol_conf.bt_stop_list);
 	strlist__delete(symbol_conf.sym_list);
 	strlist__delete(symbol_conf.dso_list);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..5d98d7e84d57 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -77,6 +77,9 @@ struct symbol {
 void symbol__delete(struct symbol *sym);
 void symbols__delete(struct rb_root_cached *symbols);
 
+void symbol__set_ignore(struct symbol *sym, bool ignore);
+void symbol__set_annotate2(struct symbol *sym, bool annotate2);
+
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
  * @symbols: the rb_root of symbols
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (15 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  16 siblings, 0 replies; 60+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

To avoid concurrent update issues with other bitfields in `struct symbol`,
this change uses the global lock `symbol_bits_lock` (introduced in a
previous commit) for updates to the `idle` field. A static helper
`symbol__set_idle` taking a boolean is used to encapsulate the lock and
mapping to `enum symbol_idle_kind`.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 108 +++++++++++++++++++++++------------
 tools/perf/util/symbol.h     |  14 +++--
 3 files changed, 81 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1238a0d6ce6e..6c642067c4ed 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -51,7 +51,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 static struct mutex symbol_bits_lock;
 
@@ -362,6 +361,13 @@ void symbol__set_annotate2(struct symbol *sym, bool annotate2)
 	mutex_unlock(&symbol_bits_lock);
 }
 
+static void symbol__set_idle(struct symbol *sym, bool idle)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->idle = idle ? SYMBOL_IDLE__IDLE : SYMBOL_IDLE__NOT_IDLE;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -375,8 +381,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -384,17 +389,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -411,7 +405,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -572,7 +566,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -738,43 +732,81 @@ int modules__parse(const char *filename, void *arg,
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+static int sym_name_cmp(const void *a, const void *b)
 {
-	const char * const idle_symbols[] = {
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
+{
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		symbol__set_idle(sym, /*idle=*/false);
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
+
+	symbol__set_idle(sym, /*idle=*/false);
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -803,7 +835,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 5d98d7e84d57..717d2f876d58 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -43,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -58,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -197,8 +203,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -281,5 +286,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2026-05-02  7:00 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
2026-02-19 11:55 ` Jan Polensky
2026-02-23 21:46 ` Namhyung Kim
2026-02-23 23:14   ` Arnaldo Melo
2026-03-02 18:43   ` Arnaldo Carvalho de Melo
2026-03-02 19:44     ` Ian Rogers
2026-03-04 14:34       ` Arnaldo Carvalho de Melo
2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-03-24 17:14   ` Ian Rogers
2026-03-25  6:58     ` Namhyung Kim
2026-03-25 15:58       ` Ian Rogers
2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
2026-03-26  7:20     ` Honglei Wang
2026-03-26 15:11       ` Ian Rogers
2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-03-27  6:56             ` Honglei Wang
2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-04-06  5:05               ` Namhyung Kim
2026-04-06 15:36                 ` Ian Rogers
2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-04-06  5:10               ` Namhyung Kim
2026-04-06 16:11                 ` Ian Rogers
2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-04-06 17:09                     ` [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-04-06 17:09                     ` [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 06/17] perf sample-raw: " Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 07/17] perf sort: " Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 12/17] perf lock-contention: " Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
2026-03-31  7:22             ` Namhyung Kim
2026-03-31 17:58               ` Ian Rogers
2026-04-01  3:41                 ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox