Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
@ 2026-02-19 11:38 Thomas Richter
  2026-02-19 11:55 ` Jan Polensky
                   ` (2 more replies)
  0 siblings, 3 replies; 106+ messages in thread
From: Thomas Richter @ 2026-02-19 11:38 UTC (permalink / raw)
  To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung
  Cc: agordeev, gor, sumanthk, hca, japo, Thomas Richter

Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")

removes symbols psw_idle() and psw_idle_exit() from the linux
kernel for s390. Remove them in perf tool's list of idle
functions. They can not be detected anymore.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
---
 tools/perf/util/symbol.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 814f960fa8f8..575951d98b1b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -752,8 +752,6 @@ static bool symbol__is_idle(const char *name)
 		"poll_idle",
 		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
 		NULL
 	};
 	int i;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
@ 2026-02-19 11:55 ` Jan Polensky
  2026-02-23 21:46 ` Namhyung Kim
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 0 replies; 106+ messages in thread
From: Jan Polensky @ 2026-02-19 11:55 UTC (permalink / raw)
  To: Thomas Richter, linux-kernel, linux-s390, linux-perf-users, acme,
	namhyung
  Cc: agordeev, gor, sumanthk, hca

On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
>
> removes symbols psw_idle() and psw_idle_exit() from the linux
> kernel for s390. Remove them in perf tool's list of idle
> functions. They can not be detected anymore.
>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
  2026-02-19 11:55 ` Jan Polensky
@ 2026-02-23 21:46 ` Namhyung Kim
  2026-02-23 23:14   ` Arnaldo Melo
  2026-03-02 18:43   ` Arnaldo Carvalho de Melo
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 2 replies; 106+ messages in thread
From: Namhyung Kim @ 2026-02-23 21:46 UTC (permalink / raw)
  To: Thomas Richter
  Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
	sumanthk, hca, japo

On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> 
> removes symbols psw_idle() and psw_idle_exit() from the linux
> kernel for s390. Remove them in perf tool's list of idle
> functions. They can not be detected anymore.

But I think old kernels may still run somewhere.  It seems the above
commit was merged to v6.10.  Maybe we should wait some more time before
removing it in the tool.

Thanks,
Namhyung

> 
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Suggested-by: Heiko Carstens <hca@linux.ibm.com>
> ---
>  tools/perf/util/symbol.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 814f960fa8f8..575951d98b1b 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -752,8 +752,6 @@ static bool symbol__is_idle(const char *name)
>  		"poll_idle",
>  		"ppc64_runlatch_off",
>  		"pseries_dedicated_idle_sleep",
> -		"psw_idle",
> -		"psw_idle_exit",
>  		NULL
>  	};
>  	int i;
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-23 21:46 ` Namhyung Kim
@ 2026-02-23 23:14   ` Arnaldo Melo
  2026-03-02 18:43   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 106+ messages in thread
From: Arnaldo Melo @ 2026-02-23 23:14 UTC (permalink / raw)
  To: Namhyung Kim, Thomas Richter
  Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
	sumanthk, hca, japo



On February 23, 2026 6:46:21 PM GMT-03:00, Namhyung Kim <namhyung@kernel.org> wrote:
>On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
>> Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
>> 
>> removes symbols psw_idle() and psw_idle_exit() from the linux
>> kernel for s390. Remove them in perf tool's list of idle
>> functions. They can not be detected anymore.
>
>But I think old kernels may still run somewhere.  It seems the above
>commit was merged to v6.10.  Maybe we should wait some more time before
>removing it in the tool.

Right, people keep asking if one can use a new version of perf on an old kernel and vice versa. 

So I think we should not apply this patch. 

There has been efforts in the past to try to have have some info per sample indicating the "context" for a sample, if it was in idle processing, hard/soft irq processing, etc, but that didn't come to fruition so far. 

With that we could get rid of this flaky heuristic of looking at a symbol name.

- Arnaldo


>
>Thanks,
>Namhyung
>
>> 
>> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
>> Suggested-by: Heiko Carstens <hca@linux.ibm.com>
>> ---
>>  tools/perf/util/symbol.c | 2 --
>>  1 file changed, 2 deletions(-)
>> 
>> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
>> index 814f960fa8f8..575951d98b1b 100644
>> --- a/tools/perf/util/symbol.c
>> +++ b/tools/perf/util/symbol.c
>> @@ -752,8 +752,6 @@ static bool symbol__is_idle(const char *name)
>>  		"poll_idle",
>>  		"ppc64_runlatch_off",
>>  		"pseries_dedicated_idle_sleep",
>> -		"psw_idle",
>> -		"psw_idle_exit",
>>  		NULL
>>  	};
>>  	int i;
>> -- 
>> 2.53.0
>> 

- Arnaldo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-02-23 21:46 ` Namhyung Kim
  2026-02-23 23:14   ` Arnaldo Melo
@ 2026-03-02 18:43   ` Arnaldo Carvalho de Melo
  2026-03-02 19:44     ` Ian Rogers
  1 sibling, 1 reply; 106+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-02 18:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Thomas Richter, linux-kernel, linux-s390, linux-perf-users,
	agordeev, gor, sumanthk, hca, japo

On Mon, Feb 23, 2026 at 01:46:21PM -0800, Namhyung Kim wrote:
> On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> > Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> > 
> > removes symbols psw_idle() and psw_idle_exit() from the linux
> > kernel for s390. Remove them in perf tool's list of idle
> > functions. They can not be detected anymore.
> 
> But I think old kernels may still run somewhere.  It seems the above
> commit was merged to v6.10.  Maybe we should wait some more time before
> removing it in the tool.

Agreed, using a new perf tool, say built from the tarballs made
available at:

https://www.kernel.org/pub/linux/kernel/tools/perf/v7.0.0/perf-7.0.0-rc1.tar.xz

(I will not make a rc2 available since there are no changes to the
tools/perf codebase in this rc).

On older kernels should still ignore those functions.

A suggestion for work in this area instead is to get those samples into
a special bucket, the "idle" one, and show it at some place in the
screen.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-03-02 18:43   ` Arnaldo Carvalho de Melo
@ 2026-03-02 19:44     ` Ian Rogers
  2026-03-04 14:34       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-02 19:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Thomas Richter, linux-kernel, linux-s390,
	linux-perf-users, agordeev, gor, sumanthk, hca, japo

On Mon, Mar 2, 2026 at 10:43 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Mon, Feb 23, 2026 at 01:46:21PM -0800, Namhyung Kim wrote:
> > On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> > > Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> > >
> > > removes symbols psw_idle() and psw_idle_exit() from the linux
> > > kernel for s390. Remove them in perf tool's list of idle
> > > functions. They can not be detected anymore.
> >
> > But I think old kernels may still run somewhere.  It seems the above
> > commit was merged to v6.10.  Maybe we should wait some more time before
> > removing it in the tool.
>
> Agreed, using a new perf tool, say built from the tarballs made
> available at:
>
> https://www.kernel.org/pub/linux/kernel/tools/perf/v7.0.0/perf-7.0.0-rc1.tar.xz
>
> (I will not make a rc2 available since there are no changes to the
> tools/perf codebase in this rc).
>
> On older kernels should still ignore those functions.
>
> A suggestion for work in this area instead is to get those samples into
> a special bucket, the "idle" one, and show it at some place in the
> screen.

Would it also be sensible to pass the perf_env:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h?h=perf-tools-next#n74
into symbol__is_idle? The contents of the perf_env are shown by `perf
report --header`:
```
# ========
# captured on    : Mon Mar  2 11:34:47 2026
# header version : 1
# data offset    : 904
# data size      : 4268216
# feat offset    : 4269120
# hostname : google.com
# os release : 6.17.13-1rodete1-amd64
# perf version : 7.0.rc1.g982b63f6380b
# arch : x86_64
# nrcpus online : 28
# nrcpus avail : 28
# cpudesc : Intel(R) Core(TM) i7-14700
# cpuid : GenuineIntel,6,183,1
...
# e_machine : 62
#   e_flags : 0
...
```
The kernel version is in the release and the e_machine/arch captures
the CPU type.

Thanks,
Ian

> Thanks,
>
> - Arnaldo
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
  2026-02-19 11:55 ` Jan Polensky
  2026-02-23 21:46 ` Namhyung Kim
@ 2026-03-02 23:43 ` Ian Rogers
  2026-03-24 17:14   ` Ian Rogers
  2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
  2 siblings, 2 replies; 106+ messages in thread
From: Ian Rogers @ 2026-03-02 23:43 UTC (permalink / raw)
  To: tmricht
  Cc: acme, agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, namhyung, sumanthk, Ian Rogers

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 106 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 85 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 710604c4f6f6..bc3c8e3b6ec0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -750,6 +750,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -829,7 +830,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 76912c62b6a0..6bb46384aa0c 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1725,7 +1725,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 8662001e1e25..6155f509ca70 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "debug.h"
 #include "event.h"
@@ -51,7 +53,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,88 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = env ? env->e_machine : EM_HOST;
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (e_machine == EM_S390) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+		sscanf(release, "%d.%d", &major, &minor);
 
-	return strlist__has_entry(idle_symbols_list, name);
+		/* Before v6.10, s390 used psw_idle. */
+		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 3fb5d146d9b1..508dd9f336e9 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -24,6 +24,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -56,8 +63,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle. */
+	enum symbol_idle_kind idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -269,5 +275,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols
  2026-03-02 19:44     ` Ian Rogers
@ 2026-03-04 14:34       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 106+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-03-04 14:34 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Namhyung Kim, Thomas Richter, linux-kernel, linux-s390,
	linux-perf-users, agordeev, gor, sumanthk, hca, japo

On Mon, Mar 02, 2026 at 11:44:19AM -0800, Ian Rogers wrote:
> On Mon, Mar 2, 2026 at 10:43 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Mon, Feb 23, 2026 at 01:46:21PM -0800, Namhyung Kim wrote:
> > > On Thu, Feb 19, 2026 at 12:38:50PM +0100, Thomas Richter wrote:
> > > > Commit fa2ae4a377c0 ("s390/idle: Rewrite psw_idle() in C")
> > > >
> > > > removes symbols psw_idle() and psw_idle_exit() from the linux
> > > > kernel for s390. Remove them in perf tool's list of idle
> > > > functions. They can not be detected anymore.
> > >
> > > But I think old kernels may still run somewhere.  It seems the above
> > > commit was merged to v6.10.  Maybe we should wait some more time before
> > > removing it in the tool.
> >
> > Agreed, using a new perf tool, say built from the tarballs made
> > available at:
> >
> > https://www.kernel.org/pub/linux/kernel/tools/perf/v7.0.0/perf-7.0.0-rc1.tar.xz
> >
> > (I will not make a rc2 available since there are no changes to the
> > tools/perf codebase in this rc).
> >
> > On older kernels should still ignore those functions.
> >
> > A suggestion for work in this area instead is to get those samples into
> > a special bucket, the "idle" one, and show it at some place in the
> > screen.
> 
> Would it also be sensible to pass the perf_env:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.h?h=perf-tools-next#n74
> into symbol__is_idle? The contents of the perf_env are shown by `perf
> report --header`:
> ```
> # ========
> # captured on    : Mon Mar  2 11:34:47 2026
> # header version : 1
> # data offset    : 904
> # data size      : 4268216
> # feat offset    : 4269120
> # hostname : google.com
> # os release : 6.17.13-1rodete1-amd64
> # perf version : 7.0.rc1.g982b63f6380b
> # arch : x86_64
> # nrcpus online : 28
> # nrcpus avail : 28
> # cpudesc : Intel(R) Core(TM) i7-14700
> # cpuid : GenuineIntel,6,183,1
> ...
> # e_machine : 62
> #   e_flags : 0
> ...
> ```
> The kernel version is in the release and the e_machine/arch captures
> the CPU type.

Yeah, I think it is a good improvement, I think you mean that we should
have per-arch idle symbol lists? 

- Arnaldo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-03-24 17:14   ` Ian Rogers
  2026-03-25  6:58     ` Namhyung Kim
  2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-24 17:14 UTC (permalink / raw)
  To: tmricht, namhyung, acme
  Cc: agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, sumanthk

On Mon, Mar 2, 2026 at 3:43 PM Ian Rogers <irogers@google.com> wrote:
>
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
>
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
>
> Signed-off-by: Ian Rogers <irogers@google.com>

Ping.

Thanks,
Ian

> ---
>  tools/perf/builtin-top.c     |   6 +-
>  tools/perf/util/symbol-elf.c |   2 +-
>  tools/perf/util/symbol.c     | 106 ++++++++++++++++++++++-------------
>  tools/perf/util/symbol.h     |  15 +++--
>  4 files changed, 85 insertions(+), 44 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 710604c4f6f6..bc3c8e3b6ec0 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -750,6 +750,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  {
>         struct perf_top *top = container_of(tool, struct perf_top, tool);
>         struct addr_location al;
> +       struct dso *dso = NULL;
>
>         if (!machine && perf_guest) {
>                 static struct intlist *seen;
> @@ -829,7 +830,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>                 }
>         }
>
> -       if (al.sym == NULL || !al.sym->idle) {
> +       if (al.map)
> +               dso = map__dso(al.map);
> +
> +       if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
>                 struct hists *hists = evsel__hists(evsel);
>                 struct hist_entry_iter iter = {
>                         .evsel          = evsel,
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 76912c62b6a0..6bb46384aa0c 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1725,7 +1725,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
>
>                 arch__sym_update(f, &sym);
>
> -               __symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> +               __symbols__insert(dso__symbols(curr_dso), f);
>                 nr++;
>         }
>         dso__put(curr_dso);
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 8662001e1e25..6155f509ca70 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -25,6 +25,8 @@
>  #include "demangle-ocaml.h"
>  #include "demangle-rust-v0.h"
>  #include "dso.h"
> +#include "dwarf-regs.h"
> +#include "env.h"
>  #include "util.h" // lsdir()
>  #include "debug.h"
>  #include "event.h"
> @@ -51,7 +53,6 @@
>
>  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> -static bool symbol__is_idle(const char *name);
>
>  int vmlinux_path__nr_entries;
>  char **vmlinux_path;
> @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
>         }
>  }
>
> -void __symbols__insert(struct rb_root_cached *symbols,
> -                      struct symbol *sym, bool kernel)
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
>         struct rb_node **p = &symbols->rb_root.rb_node;
>         struct rb_node *parent = NULL;
> @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>         struct symbol *s;
>         bool leftmost = true;
>
> -       if (kernel) {
> -               const char *name = sym->name;
> -               /*
> -                * ppc64 uses function descriptors and appends a '.' to the
> -                * start of every instruction address. Remove it.
> -                */
> -               if (name[0] == '.')
> -                       name++;
> -               sym->idle = symbol__is_idle(name);
> -       }
> -
>         while (*p != NULL) {
>                 parent = *p;
>                 s = rb_entry(parent, struct symbol, rb_node);
> @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
>
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
> -       __symbols__insert(symbols, sym, false);
> +       __symbols__insert(symbols, sym);
>  }
>
>  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
>
>  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
>  {
> -       __symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> +       __symbols__insert(dso__symbols(dso), sym);
>
>         /* update the symbol cache if necessary */
>         if (dso__last_find_result_addr(dso) >= sym->start &&
> @@ -716,47 +705,88 @@ int modules__parse(const char *filename, void *arg,
>         return err;
>  }
>
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +       const char *name = a;
> +       const char *const *sym = b;
> +
> +       return strcmp(name, *sym);
> +}
> +
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__is_idle(const char *name)
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
>  {
> -       const char * const idle_symbols[] = {
> +       static const char * const idle_symbols[] = {
>                 "acpi_idle_do_entry",
>                 "acpi_processor_ffh_cstate_enter",
>                 "arch_cpu_idle",
>                 "cpu_idle",
>                 "cpu_startup_entry",
> -               "idle_cpu",
> -               "intel_idle",
> -               "intel_idle_ibrs",
>                 "default_idle",
> -               "native_safe_halt",
>                 "enter_idle",
>                 "exit_idle",
> -               "mwait_idle",
> -               "mwait_idle_with_hints",
> -               "mwait_idle_with_hints.constprop.0",
> +               "idle_cpu",
> +               "native_safe_halt",
>                 "poll_idle",
> -               "ppc64_runlatch_off",
>                 "pseries_dedicated_idle_sleep",
> -               "psw_idle",
> -               "psw_idle_exit",
> -               NULL
>         };
> -       int i;
> -       static struct strlist *idle_symbols_list;
> +       const char *name = sym->name;
> +       uint16_t e_machine = env ? env->e_machine : EM_HOST;
> +
> +       if (sym->idle)
> +               return sym->idle == SYMBOL_IDLE__IDLE;
> +
> +       if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +               sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +               return false;
> +       }
>
> -       if (idle_symbols_list)
> -               return strlist__has_entry(idle_symbols_list, name);
> +       /*
> +        * ppc64 uses function descriptors and appends a '.' to the
> +        * start of every instruction address. Remove it.
> +        */
> +       if (name[0] == '.')
> +               name++;
> +
> +
> +       if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +                   sizeof(idle_symbols[0]), sym_name_cmp)) {
> +               sym->idle = SYMBOL_IDLE__IDLE;
> +               return true;
> +       }
> +
> +       if (e_machine == EM_386 || e_machine == EM_X86_64) {
> +               if (strstarts(name, "mwait_idle") ||
> +                   strstarts(name, "intel_idle")) {
> +                       sym->idle = SYMBOL_IDLE__IDLE;
> +                       return true;
> +               }
> +       }
> +
> +       if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
> +               sym->idle = SYMBOL_IDLE__IDLE;
> +               return true;
> +       }
>
> -       idle_symbols_list = strlist__new(NULL, NULL);
> +       if (e_machine == EM_S390) {
> +               int major = 0, minor = 0;
> +               const char *release = env && env->os_release
> +                       ? env->os_release : perf_version_string;
>
> -       for (i = 0; idle_symbols[i]; i++)
> -               strlist__add(idle_symbols_list, idle_symbols[i]);
> +               sscanf(release, "%d.%d", &major, &minor);
>
> -       return strlist__has_entry(idle_symbols_list, name);
> +               /* Before v6.10, s390 used psw_idle. */
> +               if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> +                       sym->idle = SYMBOL_IDLE__IDLE;
> +                       return true;
> +               }
> +       }
> +
> +       sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +       return false;
>  }
>
>  static int map__process_kallsym_symbol(void *arg, const char *name,
> @@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
>          * We will pass the symbols to the filter later, in
>          * map__split_kallsyms, when we have split the maps per module
>          */
> -       __symbols__insert(root, sym, !strchr(name, '['));
> +       __symbols__insert(root, sym);
>
>         return 0;
>  }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index 3fb5d146d9b1..508dd9f336e9 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -24,6 +24,7 @@ struct dso;
>  struct map;
>  struct maps;
>  struct option;
> +struct perf_env;
>  struct build_id;
>
>  /*
> @@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>                              GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>
> +enum symbol_idle_kind {
> +       SYMBOL_IDLE__UNKNOWN = 0,
> +       SYMBOL_IDLE__NOT_IDLE = 1,
> +       SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -56,8 +63,8 @@ struct symbol {
>         u8              type:4;
>         /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>         u8              binding:4;
> -       /** Set true for kernel symbols of idle routines. */
> -       u8              idle:1;
> +       /** Cache for symbol__is_idle. */
> +       enum symbol_idle_kind idle:2;
>         /** Resolvable but tools ignore it (e.g. idle routines). */
>         u8              ignore:1;
>         /** Symbol for an inlined function. */
> @@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
>
>  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>
> -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> -                      bool kernel);
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
>  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> @@ -269,5 +275,6 @@ enum {
>  };
>
>  int symbol__validate_sym_arguments(void);
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
>
>  #endif /* __PERF_SYMBOL */
> --
> 2.53.0.473.g4a7958ca14-goog
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-03-24 17:14   ` Ian Rogers
@ 2026-03-25  6:58     ` Namhyung Kim
  2026-03-25 15:58       ` Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Namhyung Kim @ 2026-03-25  6:58 UTC (permalink / raw)
  To: Ian Rogers
  Cc: tmricht, acme, agordeev, gor, hca, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Hi Ian,

Sorry for the delay.

On Tue, Mar 24, 2026 at 10:14:01AM -0700, Ian Rogers wrote:
> On Mon, Mar 2, 2026 at 3:43 PM Ian Rogers <irogers@google.com> wrote:
[SNIP]
> > -       if (idle_symbols_list)
> > -               return strlist__has_entry(idle_symbols_list, name);
> > +       /*
> > +        * ppc64 uses function descriptors and appends a '.' to the
> > +        * start of every instruction address. Remove it.
> > +        */
> > +       if (name[0] == '.')

Then e_machine == EM_PPC64 can be checked here.

> > +               name++;
> > +
> > +

Two blank lines.

> > +       if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> > +                   sizeof(idle_symbols[0]), sym_name_cmp)) {
> > +               sym->idle = SYMBOL_IDLE__IDLE;
> > +               return true;
> > +       }
> > +
> > +       if (e_machine == EM_386 || e_machine == EM_X86_64) {
> > +               if (strstarts(name, "mwait_idle") ||
> > +                   strstarts(name, "intel_idle")) {
> > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > +                       return true;
> > +               }
> > +       }
> > +
> > +       if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
> > +               sym->idle = SYMBOL_IDLE__IDLE;
> > +               return true;
> > +       }
> >
> > -       idle_symbols_list = strlist__new(NULL, NULL);
> > +       if (e_machine == EM_S390) {
> > +               int major = 0, minor = 0;
> > +               const char *release = env && env->os_release
> > +                       ? env->os_release : perf_version_string;
> >
> > -       for (i = 0; idle_symbols[i]; i++)
> > -               strlist__add(idle_symbols_list, idle_symbols[i]);
> > +               sscanf(release, "%d.%d", &major, &minor);
> >
> > -       return strlist__has_entry(idle_symbols_list, name);
> > +               /* Before v6.10, s390 used psw_idle. */
> > +               if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > +                       return true;
> > +               }
> > +       }
> > +
> > +       sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +       return false;
> >  }
> >
> >  static int map__process_kallsym_symbol(void *arg, const char *name,
> > @@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
> >          * We will pass the symbols to the filter later, in
> >          * map__split_kallsyms, when we have split the maps per module
> >          */
> > -       __symbols__insert(root, sym, !strchr(name, '['));
> > +       __symbols__insert(root, sym);
> >
> >         return 0;
> >  }
> > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > index 3fb5d146d9b1..508dd9f336e9 100644
> > --- a/tools/perf/util/symbol.h
> > +++ b/tools/perf/util/symbol.h
> > @@ -24,6 +24,7 @@ struct dso;
> >  struct map;
> >  struct maps;
> >  struct option;
> > +struct perf_env;
> >  struct build_id;
> >
> >  /*
> > @@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
> >                              GElf_Shdr *shp, const char *name, size_t *idx);
> >  #endif
> >
> > +enum symbol_idle_kind {
> > +       SYMBOL_IDLE__UNKNOWN = 0,
> > +       SYMBOL_IDLE__NOT_IDLE = 1,
> > +       SYMBOL_IDLE__IDLE = 2,
> > +};
> > +
> >  /**
> >   * A symtab entry. When allocated this may be preceded by an annotation (see
> >   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> > @@ -56,8 +63,8 @@ struct symbol {
> >         u8              type:4;
> >         /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
> >         u8              binding:4;
> > -       /** Set true for kernel symbols of idle routines. */
> > -       u8              idle:1;
> > +       /** Cache for symbol__is_idle. */
> > +       enum symbol_idle_kind idle:2;

I'm curious if bitfields with different types (u8 and enum) can be
placed consecutively bitwise.  There can be a lot of symbols so it
could be a concern.

Thanks,
Namhyung


> >         /** Resolvable but tools ignore it (e.g. idle routines). */
> >         u8              ignore:1;
> >         /** Symbol for an inlined function. */
> > @@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
> >
> >  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
> >
> > -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> > -                      bool kernel);
> > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
> >  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> > @@ -269,5 +275,6 @@ enum {
> >  };
> >
> >  int symbol__validate_sym_arguments(void);
> > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
> >
> >  #endif /* __PERF_SYMBOL */
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v1] perf symbol: Lazily compute idle and use the perf_env
  2026-03-25  6:58     ` Namhyung Kim
@ 2026-03-25 15:58       ` Ian Rogers
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-03-25 15:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: tmricht, acme, agordeev, gor, hca, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

On Tue, Mar 24, 2026 at 11:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> Sorry for the delay.
>
> On Tue, Mar 24, 2026 at 10:14:01AM -0700, Ian Rogers wrote:
> > On Mon, Mar 2, 2026 at 3:43 PM Ian Rogers <irogers@google.com> wrote:
> [SNIP]
> > > -       if (idle_symbols_list)
> > > -               return strlist__has_entry(idle_symbols_list, name);
> > > +       /*
> > > +        * ppc64 uses function descriptors and appends a '.' to the
> > > +        * start of every instruction address. Remove it.
> > > +        */
> > > +       if (name[0] == '.')
>
> Then e_machine == EM_PPC64 can be checked here.

Agreed, but potentially this is load bearing for more than just PPC so
I'd rather leave it as it is.

> > > +               name++;
> > > +
> > > +
>
> Two blank lines.

Will fix in v2.

> > > +       if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> > > +                   sizeof(idle_symbols[0]), sym_name_cmp)) {
> > > +               sym->idle = SYMBOL_IDLE__IDLE;
> > > +               return true;
> > > +       }
> > > +
> > > +       if (e_machine == EM_386 || e_machine == EM_X86_64) {
> > > +               if (strstarts(name, "mwait_idle") ||
> > > +                   strstarts(name, "intel_idle")) {
> > > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > > +                       return true;
> > > +               }
> > > +       }
> > > +
> > > +       if (e_machine == EM_PPC64 &&!strcmp(name, "ppc64_runlatch_off")) {
> > > +               sym->idle = SYMBOL_IDLE__IDLE;
> > > +               return true;
> > > +       }
> > >
> > > -       idle_symbols_list = strlist__new(NULL, NULL);
> > > +       if (e_machine == EM_S390) {
> > > +               int major = 0, minor = 0;
> > > +               const char *release = env && env->os_release
> > > +                       ? env->os_release : perf_version_string;
> > >
> > > -       for (i = 0; idle_symbols[i]; i++)
> > > -               strlist__add(idle_symbols_list, idle_symbols[i]);
> > > +               sscanf(release, "%d.%d", &major, &minor);
> > >
> > > -       return strlist__has_entry(idle_symbols_list, name);
> > > +               /* Before v6.10, s390 used psw_idle. */
> > > +               if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> > > +                       sym->idle = SYMBOL_IDLE__IDLE;
> > > +                       return true;
> > > +               }
> > > +       }
> > > +
> > > +       sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > > +       return false;
> > >  }
> > >
> > >  static int map__process_kallsym_symbol(void *arg, const char *name,
> > > @@ -785,7 +815,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
> > >          * We will pass the symbols to the filter later, in
> > >          * map__split_kallsyms, when we have split the maps per module
> > >          */
> > > -       __symbols__insert(root, sym, !strchr(name, '['));
> > > +       __symbols__insert(root, sym);
> > >
> > >         return 0;
> > >  }
> > > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > > index 3fb5d146d9b1..508dd9f336e9 100644
> > > --- a/tools/perf/util/symbol.h
> > > +++ b/tools/perf/util/symbol.h
> > > @@ -24,6 +24,7 @@ struct dso;
> > >  struct map;
> > >  struct maps;
> > >  struct option;
> > > +struct perf_env;
> > >  struct build_id;
> > >
> > >  /*
> > > @@ -41,6 +42,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
> > >                              GElf_Shdr *shp, const char *name, size_t *idx);
> > >  #endif
> > >
> > > +enum symbol_idle_kind {
> > > +       SYMBOL_IDLE__UNKNOWN = 0,
> > > +       SYMBOL_IDLE__NOT_IDLE = 1,
> > > +       SYMBOL_IDLE__IDLE = 2,
> > > +};
> > > +
> > >  /**
> > >   * A symtab entry. When allocated this may be preceded by an annotation (see
> > >   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> > > @@ -56,8 +63,8 @@ struct symbol {
> > >         u8              type:4;
> > >         /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
> > >         u8              binding:4;
> > > -       /** Set true for kernel symbols of idle routines. */
> > > -       u8              idle:1;
> > > +       /** Cache for symbol__is_idle. */
> > > +       enum symbol_idle_kind idle:2;
>
> I'm curious if bitfields with different types (u8 and enum) can be
> placed consecutively bitwise.  There can be a lot of symbols so it
> could be a concern.

pahole says no size difference:

Before:
```
struct symbol {
       struct rb_node             rb_node
__attribute__((__aligned__(8))); /*     0    24 */
       u64                        start;                /*    24     8 */
       u64                        end;                  /*    32     8 */
       u16                        namelen;              /*    40     2 */
       u8                         type:4;               /*    42: 0  1 */
       u8                         binding:4;            /*    42: 4  1 */
       u8                         idle:1;               /*    43: 0  1 */
       u8                         ignore:1;             /*    43: 1  1 */
       u8                         inlined:1;            /*    43: 2  1 */
       u8                         annotate2:1;          /*    43: 3  1 */
       u8                         ifunc_alias:1;        /*    43: 4  1 */

       /* XXX 3 bits hole, try to pack */

       u8                         arch_sym;             /*    44     1 */
       char                       name[];               /*    45     0 */

       /* size: 48, cachelines: 1, members: 13 */
       /* sum members: 43 */
       /* sum bitfield members: 13 bits, bit holes: 1, sum bit holes: 3 bits */
       /* padding: 3 */
       /* forced alignments: 1 */
       /* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
```

After:
```
struct symbol {
       struct rb_node             rb_node
__attribute__((__aligned__(8))); /*     0    24 */
       u64                        start;                /*    24     8 */
       u64                        end;                  /*    32     8 */
       u16                        namelen;              /*    40     2 */
       u8                         type:4;               /*    42: 0  1 */
       u8                         binding:4;            /*    42: 4  1 */

       /* Bitfield combined with previous fields */

       enum symbol_idle_kind      idle:2;               /*    40:24  4 */

       /* Bitfield combined with next fields */

       u8                         ignore:1;             /*    43: 2  1 */
       u8                         inlined:1;            /*    43: 3  1 */
       u8                         annotate2:1;          /*    43: 4  1 */
       u8                         ifunc_alias:1;        /*    43: 5  1 */

       /* XXX 2 bits hole, try to pack */

       u8                         arch_sym;             /*    44     1 */
       char                       name[];               /*    45     0 */

       /* size: 48, cachelines: 1, members: 13 */
       /* sum members: 43 */
       /* sum bitfield members: 14 bits, bit holes: 1, sum bit holes: 2 bits */
       /* padding: 3 */
       /* forced alignments: 1 */
       /* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));
```

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > >         /** Resolvable but tools ignore it (e.g. idle routines). */
> > >         u8              ignore:1;
> > >         /** Symbol for an inlined function. */
> > > @@ -184,8 +191,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
> > >
> > >  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
> > >
> > > -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> > > -                      bool kernel);
> > > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> > >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> > >  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
> > >  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> > > @@ -269,5 +275,6 @@ enum {
> > >  };
> > >
> > >  int symbol__validate_sym_arguments(void);
> > > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
> > >
> > >  #endif /* __PERF_SYMBOL */
> > > --
> > > 2.53.0.473.g4a7958ca14-goog
> > >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2026-03-24 17:14   ` Ian Rogers
@ 2026-03-25 16:18   ` Ian Rogers
  2026-03-26  7:20     ` Honglei Wang
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-25 16:18 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, sumanthk, jameshongleiwang

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 84 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3cd4e5a03cc5..9fabf5146d89 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ce9195717f44..1a357af93a0a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = env ? env->e_machine : EM_HOST;
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
+
+		sscanf(release, "%d.%d", &major, &minor);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..f26f67bd7982 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle. */
+	enum symbol_idle_kind idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
@ 2026-03-26  7:20     ` Honglei Wang
  2026-03-26 15:11       ` Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Honglei Wang @ 2026-03-26  7:20 UTC (permalink / raw)
  To: Ian Rogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, sumanthk

Hi Ian,

On 3/26/26 12:18 AM, Ian Rogers wrote:
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
> 
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> 
> The change switches x86 matches to use strstarts which means
> intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/
> ---
>  tools/perf/builtin-top.c     |   6 +-
>  tools/perf/util/symbol-elf.c |   2 +-
>  tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
>  tools/perf/util/symbol.h     |  15 +++--
>  4 files changed, 84 insertions(+), 44 deletions(-)
> 
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 37950efb28ac..bdc1c761cd61 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  {
>  	struct perf_top *top = container_of(tool, struct perf_top, tool);
>  	struct addr_location al;
> +	struct dso *dso = NULL;
>  
>  	if (!machine && perf_guest) {
>  		static struct intlist *seen;
> @@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  		}
>  	}
>  
> -	if (al.sym == NULL || !al.sym->idle) {
> +	if (al.map)
> +		dso = map__dso(al.map);
> +
> +	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
>  		struct hists *hists = evsel__hists(evsel);
>  		struct hist_entry_iter iter = {
>  			.evsel		= evsel,
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 3cd4e5a03cc5..9fabf5146d89 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
>  
>  		arch__sym_update(f, &sym);
>  
> -		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> +		__symbols__insert(dso__symbols(curr_dso), f);
>  		nr++;
>  	}
>  	dso__put(curr_dso);
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index ce9195717f44..1a357af93a0a 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -25,6 +25,8 @@
>  #include "demangle-ocaml.h"
>  #include "demangle-rust-v0.h"
>  #include "dso.h"
> +#include "dwarf-regs.h"
> +#include "env.h"
>  #include "util.h" // lsdir()
>  #include "event.h"
>  #include "machine.h"
> @@ -50,7 +52,6 @@
>  
>  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> -static bool symbol__is_idle(const char *name);
>  
>  int vmlinux_path__nr_entries;
>  char **vmlinux_path;
> @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
>  	}
>  }
>  
> -void __symbols__insert(struct rb_root_cached *symbols,
> -		       struct symbol *sym, bool kernel)
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
>  	struct rb_node **p = &symbols->rb_root.rb_node;
>  	struct rb_node *parent = NULL;
> @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  	struct symbol *s;
>  	bool leftmost = true;
>  
> -	if (kernel) {
> -		const char *name = sym->name;
> -		/*
> -		 * ppc64 uses function descriptors and appends a '.' to the
> -		 * start of every instruction address. Remove it.
> -		 */
> -		if (name[0] == '.')
> -			name++;
> -		sym->idle = symbol__is_idle(name);
> -	}
> -
>  	while (*p != NULL) {
>  		parent = *p;
>  		s = rb_entry(parent, struct symbol, rb_node);
> @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
> -	__symbols__insert(symbols, sym, false);
> +	__symbols__insert(symbols, sym);
>  }
>  
>  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
>  
>  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
>  {
> -	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> +	__symbols__insert(dso__symbols(dso), sym);
>  
>  	/* update the symbol cache if necessary */
>  	if (dso__last_find_result_addr(dso) >= sym->start &&
> @@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
>  	return err;
>  }
>  
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +	const char *name = a;
> +	const char *const *sym = b;
> +
> +	return strcmp(name, *sym);
> +}
> +
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__is_idle(const char *name)
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
>  {
> -	const char * const idle_symbols[] = {
> +	static const char * const idle_symbols[] = {
>  		"acpi_idle_do_entry",
>  		"acpi_processor_ffh_cstate_enter",
>  		"arch_cpu_idle",
>  		"cpu_idle",
>  		"cpu_startup_entry",
> -		"idle_cpu",
> -		"intel_idle",
> -		"intel_idle_ibrs",
>  		"default_idle",
> -		"native_safe_halt",
>  		"enter_idle",
>  		"exit_idle",
> -		"mwait_idle",
> -		"mwait_idle_with_hints",
> -		"mwait_idle_with_hints.constprop.0",
> +		"idle_cpu",
> +		"native_safe_halt",
>  		"poll_idle",
> -		"ppc64_runlatch_off",
>  		"pseries_dedicated_idle_sleep",
> -		"psw_idle",
> -		"psw_idle_exit",
> -		NULL
>  	};
> -	int i;
> -	static struct strlist *idle_symbols_list;
> +	const char *name = sym->name;
> +	uint16_t e_machine = env ? env->e_machine : EM_HOST;
>  
> -	if (idle_symbols_list)
> -		return strlist__has_entry(idle_symbols_list, name);
> +	if (sym->idle)
> +		return sym->idle == SYMBOL_IDLE__IDLE;
>  
> -	idle_symbols_list = strlist__new(NULL, NULL);
> +	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +		sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +		return false;
> +	}
>  
> -	for (i = 0; idle_symbols[i]; i++)
> -		strlist__add(idle_symbols_list, idle_symbols[i]);
> +	/*
> +	 * ppc64 uses function descriptors and appends a '.' to the
> +	 * start of every instruction address. Remove it.
> +	 */
> +	if (name[0] == '.')
> +		name++;
>  
> -	return strlist__has_entry(idle_symbols_list, name);
> +	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +		    sizeof(idle_symbols[0]), sym_name_cmp)) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_386 || e_machine == EM_X86_64) {

As said in anther thread, intel_idle_irq was still there on my test
machine. I did a bit debug and found e_machine == 0 so it couldn't run
into this branch. After dig more, it should be
deliver_event()->perf_session__find_machine() return a struct machine
whose env->e_machine is 0. I'm still busy today to do more, wish this
clue can help.

Thanks,
Honglei

> +		if (strstarts(name, "mwait_idle") ||
> +		    strstarts(name, "intel_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_S390) {
> +		int major = 0, minor = 0;
> +		const char *release = env && env->os_release
> +			? env->os_release : perf_version_string;
> +
> +		sscanf(release, "%d.%d", &major, &minor);
> +
> +		/* Before v6.10, s390 used psw_idle. */
> +		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +	return false;
>  }
>  
>  static int map__process_kallsym_symbol(void *arg, const char *name,
> @@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
>  	 * We will pass the symbols to the filter later, in
>  	 * map__split_kallsyms, when we have split the maps per module
>  	 */
> -	__symbols__insert(root, sym, !strchr(name, '['));
> +	__symbols__insert(root, sym);
>  
>  	return 0;
>  }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index c67814d6d6d6..f26f67bd7982 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -25,6 +25,7 @@ struct dso;
>  struct map;
>  struct maps;
>  struct option;
> +struct perf_env;
>  struct build_id;
>  
>  /*
> @@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>  			     GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>  
> +enum symbol_idle_kind {
> +	SYMBOL_IDLE__UNKNOWN = 0,
> +	SYMBOL_IDLE__NOT_IDLE = 1,
> +	SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -57,8 +64,8 @@ struct symbol {
>  	u8		type:4;
>  	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>  	u8		binding:4;
> -	/** Set true for kernel symbols of idle routines. */
> -	u8		idle:1;
> +	/** Cache for symbol__is_idle. */
> +	enum symbol_idle_kind idle:2;
>  	/** Resolvable but tools ignore it (e.g. idle routines). */
>  	u8		ignore:1;
>  	/** Symbol for an inlined function. */
> @@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
>  
>  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>  
> -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> -		       bool kernel);
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
>  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> @@ -286,5 +292,6 @@ enum {
>  };
>  
>  int symbol__validate_sym_arguments(void);
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
>  
>  #endif /* __PERF_SYMBOL */


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-26  7:20     ` Honglei Wang
@ 2026-03-26 15:11       ` Ian Rogers
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-26 15:11 UTC (permalink / raw)
  To: Honglei Wang
  Cc: acme, namhyung, tmricht, agordeev, gor, hca, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

On Thu, Mar 26, 2026 at 12:20 AM Honglei Wang <jameshongleiwang@126.com> wrote:
>
> Hi Ian,
>
> On 3/26/26 12:18 AM, Ian Rogers wrote:
> > Move the idle boolean to a helper symbol__is_idle function. In the
> > function lazily compute whether a symbol is an idle function taking
> > into consideration the kernel version and architecture of the
> > machine. As symbols__insert no longer needs to know if a symbol is for
> > the kernel, remove the argument.
> >
> > This change is inspired by mailing list discussion, particularly from
> > Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> > <hca@linux.ibm.com>:
> > https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> >
> > The change switches x86 matches to use strstarts which means
> > intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> > change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> > https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/
> > ---
> >  tools/perf/builtin-top.c     |   6 +-
> >  tools/perf/util/symbol-elf.c |   2 +-
> >  tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
> >  tools/perf/util/symbol.h     |  15 +++--
> >  4 files changed, 84 insertions(+), 44 deletions(-)
> >
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 37950efb28ac..bdc1c761cd61 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
> >  {
> >       struct perf_top *top = container_of(tool, struct perf_top, tool);
> >       struct addr_location al;
> > +     struct dso *dso = NULL;
> >
> >       if (!machine && perf_guest) {
> >               static struct intlist *seen;
> > @@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
> >               }
> >       }
> >
> > -     if (al.sym == NULL || !al.sym->idle) {
> > +     if (al.map)
> > +             dso = map__dso(al.map);
> > +
> > +     if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
> >               struct hists *hists = evsel__hists(evsel);
> >               struct hist_entry_iter iter = {
> >                       .evsel          = evsel,
> > diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> > index 3cd4e5a03cc5..9fabf5146d89 100644
> > --- a/tools/perf/util/symbol-elf.c
> > +++ b/tools/perf/util/symbol-elf.c
> > @@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
> >
> >               arch__sym_update(f, &sym);
> >
> > -             __symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> > +             __symbols__insert(dso__symbols(curr_dso), f);
> >               nr++;
> >       }
> >       dso__put(curr_dso);
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index ce9195717f44..1a357af93a0a 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -25,6 +25,8 @@
> >  #include "demangle-ocaml.h"
> >  #include "demangle-rust-v0.h"
> >  #include "dso.h"
> > +#include "dwarf-regs.h"
> > +#include "env.h"
> >  #include "util.h" // lsdir()
> >  #include "event.h"
> >  #include "machine.h"
> > @@ -50,7 +52,6 @@
> >
> >  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
> >  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> > -static bool symbol__is_idle(const char *name);
> >
> >  int vmlinux_path__nr_entries;
> >  char **vmlinux_path;
> > @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
> >       }
> >  }
> >
> > -void __symbols__insert(struct rb_root_cached *symbols,
> > -                    struct symbol *sym, bool kernel)
> > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
> >  {
> >       struct rb_node **p = &symbols->rb_root.rb_node;
> >       struct rb_node *parent = NULL;
> > @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
> >       struct symbol *s;
> >       bool leftmost = true;
> >
> > -     if (kernel) {
> > -             const char *name = sym->name;
> > -             /*
> > -              * ppc64 uses function descriptors and appends a '.' to the
> > -              * start of every instruction address. Remove it.
> > -              */
> > -             if (name[0] == '.')
> > -                     name++;
> > -             sym->idle = symbol__is_idle(name);
> > -     }
> > -
> >       while (*p != NULL) {
> >               parent = *p;
> >               s = rb_entry(parent, struct symbol, rb_node);
> > @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
> >
> >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
> >  {
> > -     __symbols__insert(symbols, sym, false);
> > +     __symbols__insert(symbols, sym);
> >  }
> >
> >  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> > @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
> >
> >  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
> >  {
> > -     __symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> > +     __symbols__insert(dso__symbols(dso), sym);
> >
> >       /* update the symbol cache if necessary */
> >       if (dso__last_find_result_addr(dso) >= sym->start &&
> > @@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
> >       return err;
> >  }
> >
> > +static int sym_name_cmp(const void *a, const void *b)
> > +{
> > +     const char *name = a;
> > +     const char *const *sym = b;
> > +
> > +     return strcmp(name, *sym);
> > +}
> > +
> >  /*
> >   * These are symbols in the kernel image, so make sure that
> >   * sym is from a kernel DSO.
> >   */
> > -static bool symbol__is_idle(const char *name)
> > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env)
> >  {
> > -     const char * const idle_symbols[] = {
> > +     static const char * const idle_symbols[] = {
> >               "acpi_idle_do_entry",
> >               "acpi_processor_ffh_cstate_enter",
> >               "arch_cpu_idle",
> >               "cpu_idle",
> >               "cpu_startup_entry",
> > -             "idle_cpu",
> > -             "intel_idle",
> > -             "intel_idle_ibrs",
> >               "default_idle",
> > -             "native_safe_halt",
> >               "enter_idle",
> >               "exit_idle",
> > -             "mwait_idle",
> > -             "mwait_idle_with_hints",
> > -             "mwait_idle_with_hints.constprop.0",
> > +             "idle_cpu",
> > +             "native_safe_halt",
> >               "poll_idle",
> > -             "ppc64_runlatch_off",
> >               "pseries_dedicated_idle_sleep",
> > -             "psw_idle",
> > -             "psw_idle_exit",
> > -             NULL
> >       };
> > -     int i;
> > -     static struct strlist *idle_symbols_list;
> > +     const char *name = sym->name;
> > +     uint16_t e_machine = env ? env->e_machine : EM_HOST;
> >
> > -     if (idle_symbols_list)
> > -             return strlist__has_entry(idle_symbols_list, name);
> > +     if (sym->idle)
> > +             return sym->idle == SYMBOL_IDLE__IDLE;
> >
> > -     idle_symbols_list = strlist__new(NULL, NULL);
> > +     if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> > +             sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +             return false;
> > +     }
> >
> > -     for (i = 0; idle_symbols[i]; i++)
> > -             strlist__add(idle_symbols_list, idle_symbols[i]);
> > +     /*
> > +      * ppc64 uses function descriptors and appends a '.' to the
> > +      * start of every instruction address. Remove it.
> > +      */
> > +     if (name[0] == '.')
> > +             name++;
> >
> > -     return strlist__has_entry(idle_symbols_list, name);
> > +     if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> > +                 sizeof(idle_symbols[0]), sym_name_cmp)) {
> > +             sym->idle = SYMBOL_IDLE__IDLE;
> > +             return true;
> > +     }
> > +
> > +     if (e_machine == EM_386 || e_machine == EM_X86_64) {
>
> As said in anther thread, intel_idle_irq was still there on my test
> machine. I did a bit debug and found e_machine == 0 so it couldn't run
> into this branch. After dig more, it should be
> deliver_event()->perf_session__find_machine() return a struct machine
> whose env->e_machine is 0. I'm still busy today to do more, wish this
> clue can help.

I can see this, the env's e_machine isn't being lazily initialized for
the host like the arch is. I'll add a patch for this.

Thanks,
Ian

> Thanks,
> Honglei
>
> > +             if (strstarts(name, "mwait_idle") ||
> > +                 strstarts(name, "intel_idle")) {
> > +                     sym->idle = SYMBOL_IDLE__IDLE;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
> > +             sym->idle = SYMBOL_IDLE__IDLE;
> > +             return true;
> > +     }
> > +
> > +     if (e_machine == EM_S390) {
> > +             int major = 0, minor = 0;
> > +             const char *release = env && env->os_release
> > +                     ? env->os_release : perf_version_string;
> > +
> > +             sscanf(release, "%d.%d", &major, &minor);
> > +
> > +             /* Before v6.10, s390 used psw_idle. */
> > +             if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> > +                     sym->idle = SYMBOL_IDLE__IDLE;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +     return false;
> >  }
> >
> >  static int map__process_kallsym_symbol(void *arg, const char *name,
> > @@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
> >        * We will pass the symbols to the filter later, in
> >        * map__split_kallsyms, when we have split the maps per module
> >        */
> > -     __symbols__insert(root, sym, !strchr(name, '['));
> > +     __symbols__insert(root, sym);
> >
> >       return 0;
> >  }
> > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > index c67814d6d6d6..f26f67bd7982 100644
> > --- a/tools/perf/util/symbol.h
> > +++ b/tools/perf/util/symbol.h
> > @@ -25,6 +25,7 @@ struct dso;
> >  struct map;
> >  struct maps;
> >  struct option;
> > +struct perf_env;
> >  struct build_id;
> >
> >  /*
> > @@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
> >                            GElf_Shdr *shp, const char *name, size_t *idx);
> >  #endif
> >
> > +enum symbol_idle_kind {
> > +     SYMBOL_IDLE__UNKNOWN = 0,
> > +     SYMBOL_IDLE__NOT_IDLE = 1,
> > +     SYMBOL_IDLE__IDLE = 2,
> > +};
> > +
> >  /**
> >   * A symtab entry. When allocated this may be preceded by an annotation (see
> >   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> > @@ -57,8 +64,8 @@ struct symbol {
> >       u8              type:4;
> >       /** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
> >       u8              binding:4;
> > -     /** Set true for kernel symbols of idle routines. */
> > -     u8              idle:1;
> > +     /** Cache for symbol__is_idle. */
> > +     enum symbol_idle_kind idle:2;
> >       /** Resolvable but tools ignore it (e.g. idle routines). */
> >       u8              ignore:1;
> >       /** Symbol for an inlined function. */
> > @@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
> >
> >  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
> >
> > -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> > -                    bool kernel);
> > +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
> >  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
> >  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> > @@ -286,5 +292,6 @@ enum {
> >  };
> >
> >  int symbol__validate_sym_arguments(void);
> > +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, const struct perf_env *env);
> >
> >  #endif /* __PERF_SYMBOL */
>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-03-26 15:11       ` Ian Rogers
@ 2026-03-26 17:45         ` Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                             ` (3 more replies)
  0 siblings, 4 replies; 106+ messages in thread
From: Ian Rogers @ 2026-03-26 17:45 UTC (permalink / raw)
  To: irogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, namhyung, sumanthk, tmricht

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (2):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/env.c        | 179 +++++++++++++++++++++++++++--------
 tools/perf/util/env.h        |   1 +
 tools/perf/util/session.c    |  14 +--
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 105 ++++++++++++--------
 tools/perf/util/symbol.h     |  15 ++-
 7 files changed, 235 insertions(+), 87 deletions(-)

-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-03-26 17:45           ` Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-03-26 17:45 UTC (permalink / raw)
  To: irogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, namhyung, sumanthk, tmricht

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 179 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/session.c |  14 +--
 3 files changed, 151 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 93d475a80f14..304bd8245485 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,154 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"sa110", EM_ARM},
+	{"s390", EM_S390},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+    const char *search_key = key;
+    const struct arch_to_e_machine *map_element = element;
+    size_t prefix_len = strlen(map_element->prefix);
+
+    return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = EF_HOST;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch)
+		env->arch = strdup(e_machine_to_perf_arch(perf_env__e_machine(env, /*e_flags=*/NULL)));
 
-	return normalize_arch(arch_name);
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a4501cbca375..91ff252712f4 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b465abfa36c..dcc9bef303aa 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-03-26 17:45           ` Ian Rogers
  2026-03-27  6:56             ` Honglei Wang
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
  3 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-26 17:45 UTC (permalink / raw)
  To: irogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, namhyung, sumanthk, tmricht

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 84 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3cd4e5a03cc5..9fabf5146d89 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ce9195717f44..92bc28934f36 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
+
+		sscanf(release, "%d.%d", &major, &minor);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..65422c1c8fdb 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle. */
+	enum symbol_idle_kind idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-03-27  4:50           ` Ian Rogers
  2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
  3 siblings, 2 replies; 106+ messages in thread
From: Ian Rogers @ 2026-03-27  4:50 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
    https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (2):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/env.c        | 185 ++++++++++++++++++++++++++++-------
 tools/perf/util/env.h        |   1 +
 tools/perf/util/session.c    |  14 +--
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 104 +++++++++++++-------
 tools/perf/util/symbol.h     |  15 ++-
 7 files changed, 240 insertions(+), 87 deletions(-)

-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-03-27  4:50             ` Ian Rogers
  2026-04-06  5:05               ` Namhyung Kim
  2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-27  4:50 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/session.c |  14 +--
 3 files changed, 157 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 93d475a80f14..ae08178870d7 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch) {
+		/*
+		 * Lazily compute/allocate arch. The e_machine may have been
+		 * read from a data file and so may not be EM_HOST.
+		 */
+		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	return normalize_arch(arch_name);
+		env->arch = strdup(e_machine_to_perf_arch(e_machine));
+	}
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a4501cbca375..91ff252712f4 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b465abfa36c..dcc9bef303aa 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-03-27  4:50             ` Ian Rogers
  2026-04-06  5:10               ` Namhyung Kim
  1 sibling, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-27  4:50 UTC (permalink / raw)
  To: acme, namhyung, tmricht
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 104 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 83 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3cd4e5a03cc5..9fabf5146d89 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ce9195717f44..9ff709edeb88 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,86 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = env && env->os_release
+			? env->os_release : perf_version_string;
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
+		    major < 6 || (major == 6 && minor < 10)) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +813,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..2f5f90f547aa 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                             ` (2 preceding siblings ...)
  2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-03-27  6:00           ` Ian Rogers
  2026-03-31  7:22             ` Namhyung Kim
  3 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-27  6:00 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Writing to the test output files in the current working directory can
fail in various contexts such as continual test. Other tests write to
a mktemp-ed file, make the "perf script task-analyszer tests" follow
this convention too. Currently this isn't possible for the perf.data
file due to a lack of perf script support, add a variable for when
this support is available.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/shell/test_task_analyzer.sh | 38 +++++++++++---------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/tools/perf/tests/shell/test_task_analyzer.sh b/tools/perf/tests/shell/test_task_analyzer.sh
index e194fcf61df3..b1a6a7e017e4 100755
--- a/tools/perf/tests/shell/test_task_analyzer.sh
+++ b/tools/perf/tests/shell/test_task_analyzer.sh
@@ -3,6 +3,11 @@
 # SPDX-License-Identifier: GPL-2.0
 
 tmpdir=$(mktemp -d /tmp/perf-script-task-analyzer-XXXXX)
+# TODO: perf script report only supports input from the CWD perf.data file, make
+# it support input from any file.
+perfdata="perf.data"
+csv="$tmpdir/csv"
+csvsummary="$tmpdir/csvsummary"
 err=0
 
 # set PERF_EXEC_PATH to find scripts in the source directory
@@ -15,11 +20,10 @@ fi
 export ASAN_OPTIONS=detect_leaks=0
 
 cleanup() {
-  rm -f perf.data
-  rm -f perf.data.old
-  rm -f csv
-  rm -f csvsummary
+  rm -f "${perfdata}"
+  rm -f "${perfdata}".old
   rm -rf "$tmpdir"
+
   trap - exit term int
 }
 
@@ -61,7 +65,7 @@ skip_no_probe_record_support() {
 
 prepare_perf_data() {
 	# 1s should be sufficient to catch at least some switches
-	perf record -e sched:sched_switch -a -- sleep 1 > /dev/null 2>&1
+	perf record -e sched:sched_switch -a -o "${perfdata}" -- sleep 1 > /dev/null 2>&1
 	# check if perf data file got created in above step.
 	if [ ! -e "perf.data" ]; then
 		printf "FAIL: perf record failed to create \"perf.data\" \n"
@@ -130,28 +134,28 @@ test_extended_times_summary_ns() {
 }
 
 test_csv() {
-	perf script report task-analyzer --csv csv > /dev/null
-	check_exec_0 "perf script report task-analyzer --csv csv"
-	find_str_or_fail "Comm;" csv "${FUNCNAME[0]}"
+	perf script report task-analyzer --csv "${csv}" > /dev/null
+	check_exec_0 "perf script report task-analyzer --csv ${csv}"
+	find_str_or_fail "Comm;" "${csv}" "${FUNCNAME[0]}"
 }
 
 test_csv_extended_times() {
-	perf script report task-analyzer --csv csv --extended-times > /dev/null
-	check_exec_0 "perf script report task-analyzer --csv csv --extended-times"
-	find_str_or_fail "Out-Out;" csv "${FUNCNAME[0]}"
+	perf script report task-analyzer --csv "${csv}" --extended-times > /dev/null
+	check_exec_0 "perf script report task-analyzer --csv ${csv} --extended-times"
+	find_str_or_fail "Out-Out;" "${csv}" "${FUNCNAME[0]}"
 }
 
 test_csvsummary() {
-	perf script report task-analyzer --csv-summary csvsummary > /dev/null
-	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary"
-	find_str_or_fail "Comm;" csvsummary "${FUNCNAME[0]}"
+	perf script report task-analyzer --csv-summary "${csvsummary}" > /dev/null
+	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary}"
+	find_str_or_fail "Comm;" "${csvsummary}" "${FUNCNAME[0]}"
 }
 
 test_csvsummary_extended() {
-	perf script report task-analyzer --csv-summary csvsummary --summary-extended \
+	perf script report task-analyzer --csv-summary "${csvsummary}" --summary-extended \
 	>/dev/null
-	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary --summary-extended"
-	find_str_or_fail "Out-Out;" csvsummary "${FUNCNAME[0]}"
+	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary} --summary-extended"
+	find_str_or_fail "Out-Out;" "${csvsummary}" "${FUNCNAME[0]}"
 }
 
 skip_no_probe_record_support
-- 
2.53.0.1018.g2bb0e51243-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-03-27  6:56             ` Honglei Wang
  0 siblings, 0 replies; 106+ messages in thread
From: Honglei Wang @ 2026-03-27  6:56 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, agordeev, gor, hca, japo, linux-kernel, linux-perf-users,
	linux-s390, namhyung, sumanthk, tmricht

Hi Ian,

FYI. It works on my icx machine with 'perf top'.

Thanks,
Honglei

On 3/27/26 1:45 AM, Ian Rogers wrote:
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
> 
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> 
> The change switches x86 matches to use strstarts which means
> intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-top.c     |   6 +-
>  tools/perf/util/symbol-elf.c |   2 +-
>  tools/perf/util/symbol.c     | 105 ++++++++++++++++++++++-------------
>  tools/perf/util/symbol.h     |  15 +++--
>  4 files changed, 84 insertions(+), 44 deletions(-)
> 
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 37950efb28ac..bdc1c761cd61 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  {
>  	struct perf_top *top = container_of(tool, struct perf_top, tool);
>  	struct addr_location al;
> +	struct dso *dso = NULL;
>  
>  	if (!machine && perf_guest) {
>  		static struct intlist *seen;
> @@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
>  		}
>  	}
>  
> -	if (al.sym == NULL || !al.sym->idle) {
> +	if (al.map)
> +		dso = map__dso(al.map);
> +
> +	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
>  		struct hists *hists = evsel__hists(evsel);
>  		struct hist_entry_iter iter = {
>  			.evsel		= evsel,
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 3cd4e5a03cc5..9fabf5146d89 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1723,7 +1723,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
>  
>  		arch__sym_update(f, &sym);
>  
> -		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
> +		__symbols__insert(dso__symbols(curr_dso), f);
>  		nr++;
>  	}
>  	dso__put(curr_dso);
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index ce9195717f44..92bc28934f36 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -25,6 +25,8 @@
>  #include "demangle-ocaml.h"
>  #include "demangle-rust-v0.h"
>  #include "dso.h"
> +#include "dwarf-regs.h"
> +#include "env.h"
>  #include "util.h" // lsdir()
>  #include "event.h"
>  #include "machine.h"
> @@ -50,7 +52,6 @@
>  
>  static int dso__load_kernel_sym(struct dso *dso, struct map *map);
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
> -static bool symbol__is_idle(const char *name);
>  
>  int vmlinux_path__nr_entries;
>  char **vmlinux_path;
> @@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
>  	}
>  }
>  
> -void __symbols__insert(struct rb_root_cached *symbols,
> -		       struct symbol *sym, bool kernel)
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
>  	struct rb_node **p = &symbols->rb_root.rb_node;
>  	struct rb_node *parent = NULL;
> @@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  	struct symbol *s;
>  	bool leftmost = true;
>  
> -	if (kernel) {
> -		const char *name = sym->name;
> -		/*
> -		 * ppc64 uses function descriptors and appends a '.' to the
> -		 * start of every instruction address. Remove it.
> -		 */
> -		if (name[0] == '.')
> -			name++;
> -		sym->idle = symbol__is_idle(name);
> -	}
> -
>  	while (*p != NULL) {
>  		parent = *p;
>  		s = rb_entry(parent, struct symbol, rb_node);
> @@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
>  {
> -	__symbols__insert(symbols, sym, false);
> +	__symbols__insert(symbols, sym);
>  }
>  
>  static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> @@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
>  
>  void dso__insert_symbol(struct dso *dso, struct symbol *sym)
>  {
> -	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
> +	__symbols__insert(dso__symbols(dso), sym);
>  
>  	/* update the symbol cache if necessary */
>  	if (dso__last_find_result_addr(dso) >= sym->start &&
> @@ -716,47 +705,87 @@ int modules__parse(const char *filename, void *arg,
>  	return err;
>  }
>  
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +	const char *name = a;
> +	const char *const *sym = b;
> +
> +	return strcmp(name, *sym);
> +}
> +
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__is_idle(const char *name)
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
>  {
> -	const char * const idle_symbols[] = {
> +	static const char * const idle_symbols[] = {
>  		"acpi_idle_do_entry",
>  		"acpi_processor_ffh_cstate_enter",
>  		"arch_cpu_idle",
>  		"cpu_idle",
>  		"cpu_startup_entry",
> -		"idle_cpu",
> -		"intel_idle",
> -		"intel_idle_ibrs",
>  		"default_idle",
> -		"native_safe_halt",
>  		"enter_idle",
>  		"exit_idle",
> -		"mwait_idle",
> -		"mwait_idle_with_hints",
> -		"mwait_idle_with_hints.constprop.0",
> +		"idle_cpu",
> +		"native_safe_halt",
>  		"poll_idle",
> -		"ppc64_runlatch_off",
>  		"pseries_dedicated_idle_sleep",
> -		"psw_idle",
> -		"psw_idle_exit",
> -		NULL
>  	};
> -	int i;
> -	static struct strlist *idle_symbols_list;
> +	const char *name = sym->name;
> +	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
>  
> -	if (idle_symbols_list)
> -		return strlist__has_entry(idle_symbols_list, name);
> +	if (sym->idle)
> +		return sym->idle == SYMBOL_IDLE__IDLE;
>  
> -	idle_symbols_list = strlist__new(NULL, NULL);
> +	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +		sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +		return false;
> +	}
>  
> -	for (i = 0; idle_symbols[i]; i++)
> -		strlist__add(idle_symbols_list, idle_symbols[i]);
> +	/*
> +	 * ppc64 uses function descriptors and appends a '.' to the
> +	 * start of every instruction address. Remove it.
> +	 */
> +	if (name[0] == '.')
> +		name++;
>  
> -	return strlist__has_entry(idle_symbols_list, name);
> +	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +		    sizeof(idle_symbols[0]), sym_name_cmp)) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_386 || e_machine == EM_X86_64) {
> +		if (strstarts(name, "mwait_idle") ||
> +		    strstarts(name, "intel_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}
> +
> +	if (e_machine == EM_S390) {
> +		int major = 0, minor = 0;
> +		const char *release = env && env->os_release
> +			? env->os_release : perf_version_string;
> +
> +		sscanf(release, "%d.%d", &major, &minor);
> +
> +		/* Before v6.10, s390 used psw_idle. */
> +		if ((major < 6 || (major == 6 && minor < 10)) && strstarts(name, "psw_idle")) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +	return false;
>  }
>  
>  static int map__process_kallsym_symbol(void *arg, const char *name,
> @@ -785,7 +814,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
>  	 * We will pass the symbols to the filter later, in
>  	 * map__split_kallsyms, when we have split the maps per module
>  	 */
> -	__symbols__insert(root, sym, !strchr(name, '['));
> +	__symbols__insert(root, sym);
>  
>  	return 0;
>  }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index c67814d6d6d6..65422c1c8fdb 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -25,6 +25,7 @@ struct dso;
>  struct map;
>  struct maps;
>  struct option;
> +struct perf_env;
>  struct build_id;
>  
>  /*
> @@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>  			     GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>  
> +enum symbol_idle_kind {
> +	SYMBOL_IDLE__UNKNOWN = 0,
> +	SYMBOL_IDLE__NOT_IDLE = 1,
> +	SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -57,8 +64,8 @@ struct symbol {
>  	u8		type:4;
>  	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>  	u8		binding:4;
> -	/** Set true for kernel symbols of idle routines. */
> -	u8		idle:1;
> +	/** Cache for symbol__is_idle. */
> +	enum symbol_idle_kind idle:2;
>  	/** Resolvable but tools ignore it (e.g. idle routines). */
>  	u8		ignore:1;
>  	/** Symbol for an inlined function. */
> @@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
>  
>  char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>  
> -void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
> -		       bool kernel);
> +void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
>  void symbols__fixup_duplicate(struct rb_root_cached *symbols);
>  void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
> @@ -286,5 +292,6 @@ enum {
>  };
>  
>  int symbol__validate_sym_arguments(void);
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
>  
>  #endif /* __PERF_SYMBOL */


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
@ 2026-03-31  7:22             ` Namhyung Kim
  2026-03-31 17:58               ` Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Namhyung Kim @ 2026-03-31  7:22 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

I'm curious why this patch is in the idle symbol thread.


On Thu, Mar 26, 2026 at 11:00:33PM -0700, Ian Rogers wrote:
> Writing to the test output files in the current working directory can
> fail in various contexts such as continual test. Other tests write to
> a mktemp-ed file, make the "perf script task-analyszer tests" follow
> this convention too. Currently this isn't possible for the perf.data
> file due to a lack of perf script support, add a variable for when
> this support is available.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/tests/shell/test_task_analyzer.sh | 38 +++++++++++---------
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/test_task_analyzer.sh b/tools/perf/tests/shell/test_task_analyzer.sh
> index e194fcf61df3..b1a6a7e017e4 100755
> --- a/tools/perf/tests/shell/test_task_analyzer.sh
> +++ b/tools/perf/tests/shell/test_task_analyzer.sh
> @@ -3,6 +3,11 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
>  tmpdir=$(mktemp -d /tmp/perf-script-task-analyzer-XXXXX)
> +# TODO: perf script report only supports input from the CWD perf.data file, make
> +# it support input from any file.
> +perfdata="perf.data"
> +csv="$tmpdir/csv"
> +csvsummary="$tmpdir/csvsummary"
>  err=0
>  
>  # set PERF_EXEC_PATH to find scripts in the source directory
> @@ -15,11 +20,10 @@ fi
>  export ASAN_OPTIONS=detect_leaks=0
>  
>  cleanup() {
> -  rm -f perf.data
> -  rm -f perf.data.old
> -  rm -f csv
> -  rm -f csvsummary
> +  rm -f "${perfdata}"
> +  rm -f "${perfdata}".old
>    rm -rf "$tmpdir"
> +
>    trap - exit term int
>  }
>  
> @@ -61,7 +65,7 @@ skip_no_probe_record_support() {
>  
>  prepare_perf_data() {
>  	# 1s should be sufficient to catch at least some switches
> -	perf record -e sched:sched_switch -a -- sleep 1 > /dev/null 2>&1
> +	perf record -e sched:sched_switch -a -o "${perfdata}" -- sleep 1 > /dev/null 2>&1
>  	# check if perf data file got created in above step.
>  	if [ ! -e "perf.data" ]; then
>  		printf "FAIL: perf record failed to create \"perf.data\" \n"

Please update this part too.

Thanks,
Namhyung


> @@ -130,28 +134,28 @@ test_extended_times_summary_ns() {
>  }
>  
>  test_csv() {
> -	perf script report task-analyzer --csv csv > /dev/null
> -	check_exec_0 "perf script report task-analyzer --csv csv"
> -	find_str_or_fail "Comm;" csv "${FUNCNAME[0]}"
> +	perf script report task-analyzer --csv "${csv}" > /dev/null
> +	check_exec_0 "perf script report task-analyzer --csv ${csv}"
> +	find_str_or_fail "Comm;" "${csv}" "${FUNCNAME[0]}"
>  }
>  
>  test_csv_extended_times() {
> -	perf script report task-analyzer --csv csv --extended-times > /dev/null
> -	check_exec_0 "perf script report task-analyzer --csv csv --extended-times"
> -	find_str_or_fail "Out-Out;" csv "${FUNCNAME[0]}"
> +	perf script report task-analyzer --csv "${csv}" --extended-times > /dev/null
> +	check_exec_0 "perf script report task-analyzer --csv ${csv} --extended-times"
> +	find_str_or_fail "Out-Out;" "${csv}" "${FUNCNAME[0]}"
>  }
>  
>  test_csvsummary() {
> -	perf script report task-analyzer --csv-summary csvsummary > /dev/null
> -	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary"
> -	find_str_or_fail "Comm;" csvsummary "${FUNCNAME[0]}"
> +	perf script report task-analyzer --csv-summary "${csvsummary}" > /dev/null
> +	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary}"
> +	find_str_or_fail "Comm;" "${csvsummary}" "${FUNCNAME[0]}"
>  }
>  
>  test_csvsummary_extended() {
> -	perf script report task-analyzer --csv-summary csvsummary --summary-extended \
> +	perf script report task-analyzer --csv-summary "${csvsummary}" --summary-extended \
>  	>/dev/null
> -	check_exec_0 "perf script report task-analyzer --csv-summary csvsummary --summary-extended"
> -	find_str_or_fail "Out-Out;" csvsummary "${FUNCNAME[0]}"
> +	check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary} --summary-extended"
> +	find_str_or_fail "Out-Out;" "${csvsummary}" "${FUNCNAME[0]}"
>  }
>  
>  skip_no_probe_record_support
> -- 
> 2.53.0.1018.g2bb0e51243-goog
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-31  7:22             ` Namhyung Kim
@ 2026-03-31 17:58               ` Ian Rogers
  2026-04-01  3:41                 ` Namhyung Kim
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-03-31 17:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

On Tue, Mar 31, 2026 at 12:22 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> I'm curious why this patch is in the idle symbol thread.

I'll separate it, I was gathering fixes. Same branch has the BPF
counters test fix in it:
https://lore.kernel.org/lkml/20260325171653.1091337-1-irogers@google.com/

> On Thu, Mar 26, 2026 at 11:00:33PM -0700, Ian Rogers wrote:
> > Writing to the test output files in the current working directory can
> > fail in various contexts such as continual test. Other tests write to
> > a mktemp-ed file, make the "perf script task-analyszer tests" follow
> > this convention too. Currently this isn't possible for the perf.data
> > file due to a lack of perf script support, add a variable for when
> > this support is available.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/tests/shell/test_task_analyzer.sh | 38 +++++++++++---------
> >  1 file changed, 21 insertions(+), 17 deletions(-)
> >
> > diff --git a/tools/perf/tests/shell/test_task_analyzer.sh b/tools/perf/tests/shell/test_task_analyzer.sh
> > index e194fcf61df3..b1a6a7e017e4 100755
> > --- a/tools/perf/tests/shell/test_task_analyzer.sh
> > +++ b/tools/perf/tests/shell/test_task_analyzer.sh
> > @@ -3,6 +3,11 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >
> >  tmpdir=$(mktemp -d /tmp/perf-script-task-analyzer-XXXXX)
> > +# TODO: perf script report only supports input from the CWD perf.data file, make
> > +# it support input from any file.
> > +perfdata="perf.data"
> > +csv="$tmpdir/csv"
> > +csvsummary="$tmpdir/csvsummary"
> >  err=0
> >
> >  # set PERF_EXEC_PATH to find scripts in the source directory
> > @@ -15,11 +20,10 @@ fi
> >  export ASAN_OPTIONS=detect_leaks=0
> >
> >  cleanup() {
> > -  rm -f perf.data
> > -  rm -f perf.data.old
> > -  rm -f csv
> > -  rm -f csvsummary
> > +  rm -f "${perfdata}"
> > +  rm -f "${perfdata}".old
> >    rm -rf "$tmpdir"
> > +
> >    trap - exit term int
> >  }
> >
> > @@ -61,7 +65,7 @@ skip_no_probe_record_support() {
> >
> >  prepare_perf_data() {
> >       # 1s should be sufficient to catch at least some switches
> > -     perf record -e sched:sched_switch -a -- sleep 1 > /dev/null 2>&1
> > +     perf record -e sched:sched_switch -a -o "${perfdata}" -- sleep 1 > /dev/null 2>&1
> >       # check if perf data file got created in above step.
> >       if [ ! -e "perf.data" ]; then
> >               printf "FAIL: perf record failed to create \"perf.data\" \n"
>
> Please update this part too.

Done.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > @@ -130,28 +134,28 @@ test_extended_times_summary_ns() {
> >  }
> >
> >  test_csv() {
> > -     perf script report task-analyzer --csv csv > /dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv csv"
> > -     find_str_or_fail "Comm;" csv "${FUNCNAME[0]}"
> > +     perf script report task-analyzer --csv "${csv}" > /dev/null
> > +     check_exec_0 "perf script report task-analyzer --csv ${csv}"
> > +     find_str_or_fail "Comm;" "${csv}" "${FUNCNAME[0]}"
> >  }
> >
> >  test_csv_extended_times() {
> > -     perf script report task-analyzer --csv csv --extended-times > /dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv csv --extended-times"
> > -     find_str_or_fail "Out-Out;" csv "${FUNCNAME[0]}"
> > +     perf script report task-analyzer --csv "${csv}" --extended-times > /dev/null
> > +     check_exec_0 "perf script report task-analyzer --csv ${csv} --extended-times"
> > +     find_str_or_fail "Out-Out;" "${csv}" "${FUNCNAME[0]}"
> >  }
> >
> >  test_csvsummary() {
> > -     perf script report task-analyzer --csv-summary csvsummary > /dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv-summary csvsummary"
> > -     find_str_or_fail "Comm;" csvsummary "${FUNCNAME[0]}"
> > +     perf script report task-analyzer --csv-summary "${csvsummary}" > /dev/null
> > +     check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary}"
> > +     find_str_or_fail "Comm;" "${csvsummary}" "${FUNCNAME[0]}"
> >  }
> >
> >  test_csvsummary_extended() {
> > -     perf script report task-analyzer --csv-summary csvsummary --summary-extended \
> > +     perf script report task-analyzer --csv-summary "${csvsummary}" --summary-extended \
> >       >/dev/null
> > -     check_exec_0 "perf script report task-analyzer --csv-summary csvsummary --summary-extended"
> > -     find_str_or_fail "Out-Out;" csvsummary "${FUNCNAME[0]}"
> > +     check_exec_0 "perf script report task-analyzer --csv-summary ${csvsummary} --summary-extended"
> > +     find_str_or_fail "Out-Out;" "${csvsummary}" "${FUNCNAME[0]}"
> >  }
> >
> >  skip_no_probe_record_support
> > --
> > 2.53.0.1018.g2bb0e51243-goog
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v2] perf tests task-analyzer: Write test files to tmpdir
  2026-03-31 17:58               ` Ian Rogers
@ 2026-04-01  3:41                 ` Namhyung Kim
  0 siblings, 0 replies; 106+ messages in thread
From: Namhyung Kim @ 2026-04-01  3:41 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

On Tue, Mar 31, 2026 at 10:58:55AM -0700, Ian Rogers wrote:
> On Tue, Mar 31, 2026 at 12:22 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > I'm curious why this patch is in the idle symbol thread.
> 
> I'll separate it, I was gathering fixes. Same branch has the BPF
> counters test fix in it:
> https://lore.kernel.org/lkml/20260325171653.1091337-1-irogers@google.com/

Ok, I'll test and process it.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-06  5:05               ` Namhyung Kim
  2026-04-06 15:36                 ` Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Namhyung Kim @ 2026-04-06  5:05 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Thu, Mar 26, 2026 at 09:50:24PM -0700, Ian Rogers wrote:
> Add a helper that lazily computes the e_machine and falls back of
> EM_HOST. Use the perf_env's arch to compute the e_machine if
> available. Use a binary search for some efficiency in this, but handle
> somewhat complex duplicate rules. Switch perf_env__arch to be derived
> the e_machine for consistency. This switches arch from being uname
> derived to matching that of the perf binary (via EM_HOST). Update
> session to use the helper, which may mean using EM_HOST when no
> threads are available. This also updates the perf data file header
> that gets the e_machine/e_flags from the session.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
>  tools/perf/util/env.h     |   1 +
>  tools/perf/util/session.c |  14 +--
>  3 files changed, 157 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 93d475a80f14..ae08178870d7 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -1,10 +1,12 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "cpumap.h"
> +#include "dwarf-regs.h"
>  #include "debug.h"
>  #include "env.h"
>  #include "util/header.h"
>  #include "util/rwsem.h"
>  #include <linux/compiler.h>
> +#include <linux/kernel.h>
>  #include <linux/ctype.h>
>  #include <linux/rbtree.h>
>  #include <linux/string.h>
> @@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
>  	zfree(&cache->size);
>  }
>  
> +struct arch_to_e_machine {
> +	const char *prefix;
> +	uint16_t e_machine;
> +};
> +
>  /*
> - * Return architecture name in a normalized form.
> - * The conversion logic comes from the Makefile.
> + * A mapping from an arch prefix string to an ELF machine that can be used in a
> + * bsearch. Some arch prefixes are shared an need additional processing as
> + * marked next to the architecture. The prefixes handle both perf's architecture
> + * naming and those from uname.
>   */
> -static const char *normalize_arch(char *arch)
> -{
> -	if (!strcmp(arch, "x86_64"))
> -		return "x86";
> -	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
> -		return "x86";
> -	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
> -		return "sparc";
> -	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
> -		return "arm64";
> -	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
> -		return "arm";
> -	if (!strncmp(arch, "s390", 4))
> -		return "s390";
> -	if (!strncmp(arch, "parisc", 6))
> -		return "parisc";
> -	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
> -		return "powerpc";
> -	if (!strncmp(arch, "mips", 4))
> -		return "mips";
> -	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
> -		return "sh";
> -	if (!strncmp(arch, "loongarch", 9))
> -		return "loongarch";
> -
> -	return arch;
> +static const struct arch_to_e_machine prefix_to_e_machine[] = {
> +	{"aarch64", EM_AARCH64},
> +	{"alpha", EM_ALPHA},
> +	{"arc", EM_ARC},
> +	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
> +	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
> +	{"bfin", EM_BLACKFIN},
> +	{"blackfin", EM_BLACKFIN},
> +	{"cris", EM_CRIS},
> +	{"csky", EM_CSKY},
> +	{"hppa", EM_PARISC},
> +	{"i386", EM_386},
> +	{"i486", EM_386},
> +	{"i586", EM_386},
> +	{"i686", EM_386},
> +	{"loongarch", EM_LOONGARCH},
> +	{"m32r", EM_M32R},
> +	{"m68k", EM_68K},
> +	{"microblaze", EM_MICROBLAZE},
> +	{"mips", EM_MIPS},
> +	{"msp430", EM_MSP430},
> +	{"parisc", EM_PARISC},
> +	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
> +	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
> +	{"riscv", EM_RISCV},
> +	{"s390", EM_S390},
> +	{"sa110", EM_ARM},
> +	{"sh", EM_SH},
> +	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
> +	{"sun4u", EM_SPARC},
> +	{"x86", EM_X86_64}, /* Check also for EM_386. */
> +	{"xtensa", EM_XTENSA},
> +};
> +
> +static int compare_prefix(const void *key, const void *element)
> +{
> +	const char *search_key = key;
> +	const struct arch_to_e_machine *map_element = element;
> +	size_t prefix_len = strlen(map_element->prefix);
> +
> +	return strncmp(search_key, map_element->prefix, prefix_len);
> +}
> +
> +static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
> +{
> +	/* Binary search for a matching prefix. */
> +	const struct arch_to_e_machine *result;
> +
> +	if (!perf_arch)
> +		return EM_HOST;
> +
> +	result = bsearch(perf_arch,
> +			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
> +			 sizeof(prefix_to_e_machine[0]),
> +			 compare_prefix);
> +
> +	if (!result) {
> +		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
> +		return EM_NONE;
> +	}
> +
> +	/* Handle conflicting prefixes. */
> +	switch (result->e_machine) {
> +	case EM_ARM:
> +		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
> +	case EM_AVR:
> +		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
> +	case EM_PPC:
> +		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;

I'm curious what's the name `uname -m` returns for PPC64.  Is
"powerpc64" possible?


> +	case EM_SPARC:
> +		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
> +	case EM_X86_64:
> +		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
> +	default:
> +		return result->e_machine;
> +	}
> +}
> +
> +static const char *e_machine_to_perf_arch(uint16_t e_machine)
> +{
> +	/*
> +	 * Table for if either the perf arch string differs from uname or there
> +	 * are >1 ELF machine with the prefix.
> +	 */
> +	static const struct arch_to_e_machine extras[] = {
> +		{"arm64", EM_AARCH64},
> +		{"avr32", EM_AVR32},
> +		{"powerpc", EM_PPC},
> +		{"powerpc", EM_PPC64},

Here it returns powerpc for both.


> +		{"sparc", EM_SPARCV9},
> +		{"x86", EM_386},
> +		{"x86", EM_X86_64},
> +		{"none", EM_NONE},
> +	};
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
> +		if (extras[i].e_machine == e_machine)
> +			return extras[i].prefix;
> +	}
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
> +		if (prefix_to_e_machine[i].e_machine == e_machine)
> +			return prefix_to_e_machine[i].prefix;
> +
> +	}
> +	return "unknown";
> +}
> +
> +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
> +{
> +	if (!env) {
> +		if (e_flags)
> +			*e_flags = EF_HOST;
> +
> +		return EM_HOST;
> +	}
> +	if (env->e_machine == EM_NONE) {
> +		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
> +
> +		if (env->e_machine == EM_HOST)
> +			env->e_flags = EF_HOST;
> +	}
> +	if (e_flags)
> +		*e_flags = env->e_flags;
> +
> +	return env->e_machine;
>  }
>  
>  const char *perf_env__arch(struct perf_env *env)
>  {
> -	char *arch_name;
> +	if (!env)
> +		return e_machine_to_perf_arch(EM_HOST);
>  
> -	if (!env || !env->arch) { /* Assume local operation */
> -		static struct utsname uts = { .machine[0] = '\0', };
> -		if (uts.machine[0] == '\0' && uname(&uts) < 0)
> -			return NULL;
> -		arch_name = uts.machine;
> -	} else
> -		arch_name = env->arch;
> +	if (!env->arch) {
> +		/*
> +		 * Lazily compute/allocate arch. The e_machine may have been
> +		 * read from a data file and so may not be EM_HOST.
> +		 */
> +		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
>  
> -	return normalize_arch(arch_name);
> +		env->arch = strdup(e_machine_to_perf_arch(e_machine));
> +	}
> +	return env->arch;
>  }
>  
>  #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index a4501cbca375..91ff252712f4 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
>  
>  void cpu_cache_level__free(struct cpu_cache_level *cache);
>  
> +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
>  const char *perf_env__arch(struct perf_env *env);
>  const char *perf_env__arch_strerrno(struct perf_env *env, int err);
>  const char *perf_env__cpuid(struct perf_env *env);
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4b465abfa36c..dcc9bef303aa 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
>  		return EM_HOST;
>  	}
>  
> +	/* Is the env caching an e_machine? */
>  	env = perf_session__env(session);
> -	if (env && env->e_machine != EM_NONE) {
> -		if (e_flags)
> -			*e_flags = env->e_flags;
> -
> -		return env->e_machine;
> -	}
> +	if (env && env->e_machine != EM_NONE)
> +		return perf_env__e_machine(env, e_flags);
>  
> +	/*
> +	 * Compute from threads, note this is more accurate than
> +	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
> +	 * mixed 32-bit and 64-bit threads.
> +	 */
>  	machines__for_each_thread(&session->machines,
>  				  perf_session__e_machine_cb,
>  				  &args);
> -- 
> 2.53.0.1018.g2bb0e51243-goog
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-04-06  5:10               ` Namhyung Kim
  2026-04-06 16:11                 ` Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Namhyung Kim @ 2026-04-06  5:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Thu, Mar 26, 2026 at 09:50:25PM -0700, Ian Rogers wrote:
> Move the idle boolean to a helper symbol__is_idle function. In the
> function lazily compute whether a symbol is an idle function taking
> into consideration the kernel version and architecture of the
> machine. As symbols__insert no longer needs to know if a symbol is for
> the kernel, remove the argument.
> 
> This change is inspired by mailing list discussion, particularly from
> Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> <hca@linux.ibm.com>:
> https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> 
> The change switches x86 matches to use strstarts which means
> intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
[SNIP]
> +	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
> +		int major = 0, minor = 0;
> +		const char *release = env && env->os_release
> +			? env->os_release : perf_version_string;

I think Sashiko's review is right.  You need to check the kernel version
instead of perf.

Thanks,
Namhyung

> +
> +		/* Before v6.10, s390 used psw_idle. */
> +		if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
> +		    major < 6 || (major == 6 && minor < 10)) {
> +			sym->idle = SYMBOL_IDLE__IDLE;
> +			return true;
> +		}
> +	}
> +
> +	sym->idle = SYMBOL_IDLE__NOT_IDLE;
> +	return false;
>  }

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-06  5:05               ` Namhyung Kim
@ 2026-04-06 15:36                 ` Ian Rogers
  0 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-06 15:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Sun, Apr 5, 2026 at 10:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Mar 26, 2026 at 09:50:24PM -0700, Ian Rogers wrote:
> > Add a helper that lazily computes the e_machine and falls back of
> > EM_HOST. Use the perf_env's arch to compute the e_machine if
> > available. Use a binary search for some efficiency in this, but handle
> > somewhat complex duplicate rules. Switch perf_env__arch to be derived
> > the e_machine for consistency. This switches arch from being uname
> > derived to matching that of the perf binary (via EM_HOST). Update
> > session to use the helper, which may mean using EM_HOST when no
> > threads are available. This also updates the perf data file header
> > that gets the e_machine/e_flags from the session.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
> >  tools/perf/util/env.h     |   1 +
> >  tools/perf/util/session.c |  14 +--
> >  3 files changed, 157 insertions(+), 43 deletions(-)
> >
> > diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> > index 93d475a80f14..ae08178870d7 100644
> > --- a/tools/perf/util/env.c
> > +++ b/tools/perf/util/env.c
> > @@ -1,10 +1,12 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include "cpumap.h"
> > +#include "dwarf-regs.h"
> >  #include "debug.h"
> >  #include "env.h"
> >  #include "util/header.h"
> >  #include "util/rwsem.h"
> >  #include <linux/compiler.h>
> > +#include <linux/kernel.h>
> >  #include <linux/ctype.h>
> >  #include <linux/rbtree.h>
> >  #include <linux/string.h>
> > @@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
> >       zfree(&cache->size);
> >  }
> >
> > +struct arch_to_e_machine {
> > +     const char *prefix;
> > +     uint16_t e_machine;
> > +};
> > +
> >  /*
> > - * Return architecture name in a normalized form.
> > - * The conversion logic comes from the Makefile.
> > + * A mapping from an arch prefix string to an ELF machine that can be used in a
> > + * bsearch. Some arch prefixes are shared an need additional processing as
> > + * marked next to the architecture. The prefixes handle both perf's architecture
> > + * naming and those from uname.
> >   */
> > -static const char *normalize_arch(char *arch)
> > -{
> > -     if (!strcmp(arch, "x86_64"))
> > -             return "x86";
> > -     if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
> > -             return "x86";
> > -     if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
> > -             return "sparc";
> > -     if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
> > -             return "arm64";
> > -     if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
> > -             return "arm";
> > -     if (!strncmp(arch, "s390", 4))
> > -             return "s390";
> > -     if (!strncmp(arch, "parisc", 6))
> > -             return "parisc";
> > -     if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
> > -             return "powerpc";
> > -     if (!strncmp(arch, "mips", 4))
> > -             return "mips";
> > -     if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
> > -             return "sh";
> > -     if (!strncmp(arch, "loongarch", 9))
> > -             return "loongarch";
> > -
> > -     return arch;
> > +static const struct arch_to_e_machine prefix_to_e_machine[] = {
> > +     {"aarch64", EM_AARCH64},
> > +     {"alpha", EM_ALPHA},
> > +     {"arc", EM_ARC},
> > +     {"arm", EM_ARM}, /* Check also for EM_AARCH64. */
> > +     {"avr", EM_AVR},  /* Check also for EM_AVR32. */
> > +     {"bfin", EM_BLACKFIN},
> > +     {"blackfin", EM_BLACKFIN},
> > +     {"cris", EM_CRIS},
> > +     {"csky", EM_CSKY},
> > +     {"hppa", EM_PARISC},
> > +     {"i386", EM_386},
> > +     {"i486", EM_386},
> > +     {"i586", EM_386},
> > +     {"i686", EM_386},
> > +     {"loongarch", EM_LOONGARCH},
> > +     {"m32r", EM_M32R},
> > +     {"m68k", EM_68K},
> > +     {"microblaze", EM_MICROBLAZE},
> > +     {"mips", EM_MIPS},
> > +     {"msp430", EM_MSP430},
> > +     {"parisc", EM_PARISC},
> > +     {"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
> > +     {"ppc", EM_PPC}, /* Check also for EM_PPC64. */
> > +     {"riscv", EM_RISCV},
> > +     {"s390", EM_S390},
> > +     {"sa110", EM_ARM},
> > +     {"sh", EM_SH},
> > +     {"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
> > +     {"sun4u", EM_SPARC},
> > +     {"x86", EM_X86_64}, /* Check also for EM_386. */
> > +     {"xtensa", EM_XTENSA},
> > +};
> > +
> > +static int compare_prefix(const void *key, const void *element)
> > +{
> > +     const char *search_key = key;
> > +     const struct arch_to_e_machine *map_element = element;
> > +     size_t prefix_len = strlen(map_element->prefix);
> > +
> > +     return strncmp(search_key, map_element->prefix, prefix_len);
> > +}
> > +
> > +static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
> > +{
> > +     /* Binary search for a matching prefix. */
> > +     const struct arch_to_e_machine *result;
> > +
> > +     if (!perf_arch)
> > +             return EM_HOST;
> > +
> > +     result = bsearch(perf_arch,
> > +                      prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
> > +                      sizeof(prefix_to_e_machine[0]),
> > +                      compare_prefix);
> > +
> > +     if (!result) {
> > +             pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
> > +             return EM_NONE;
> > +     }
> > +
> > +     /* Handle conflicting prefixes. */
> > +     switch (result->e_machine) {
> > +     case EM_ARM:
> > +             return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
> > +     case EM_AVR:
> > +             return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
> > +     case EM_PPC:
> > +             return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
>
> I'm curious what's the name `uname -m` returns for PPC64.  Is
> "powerpc64" possible?

It is.

> > +     case EM_SPARC:
> > +             return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
> > +     case EM_X86_64:
> > +             return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
> > +     default:
> > +             return result->e_machine;
> > +     }
> > +}
> > +
> > +static const char *e_machine_to_perf_arch(uint16_t e_machine)
> > +{
> > +     /*
> > +      * Table for if either the perf arch string differs from uname or there
> > +      * are >1 ELF machine with the prefix.
> > +      */
> > +     static const struct arch_to_e_machine extras[] = {
> > +             {"arm64", EM_AARCH64},
> > +             {"avr32", EM_AVR32},
> > +             {"powerpc", EM_PPC},
> > +             {"powerpc", EM_PPC64},
>
> Here it returns powerpc for both.

Yep. This is 100% intentional as the existing code does the same:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/env.c?h=perf-tools-next#n611
```
static const char *normalize_arch(char *arch)
...
if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
return "powerpc";
```
The strncmp is limited to just the prefix of the uname string,
ignoring the 64. So the arch "powerpc" can be 32-bit or 64-bit, just
as "x86" can be 32-bit or 64-bit. To determine which case applies, the
code should really check `struct perf_env`'s `kernel_is_64_bit`. I
think this is generally much more painful than just using the
e_machine - especially since you need to strcmp the name. For the
e_machine, the problem is that on x86 we have 32-bit, x32 and x86_64.
There is then also an ABI question regarding the use of SIMD registers
and the newer APX registers. If there are no samples and no DSOs in
play, making a choice of e_machine to set up variables with is
somewhat arbitrary. I think EM_HOST, the e_machine of the current perf
binary, is a good choice.

Thanks,
Ian

> > +             {"sparc", EM_SPARCV9},
> > +             {"x86", EM_386},
> > +             {"x86", EM_X86_64},
> > +             {"none", EM_NONE},
> > +     };
> > +
> > +     for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
> > +             if (extras[i].e_machine == e_machine)
> > +                     return extras[i].prefix;
> > +     }
> > +
> > +     for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
> > +             if (prefix_to_e_machine[i].e_machine == e_machine)
> > +                     return prefix_to_e_machine[i].prefix;
> > +
> > +     }
> > +     return "unknown";
> > +}
> > +
> > +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
> > +{
> > +     if (!env) {
> > +             if (e_flags)
> > +                     *e_flags = EF_HOST;
> > +
> > +             return EM_HOST;
> > +     }
> > +     if (env->e_machine == EM_NONE) {
> > +             env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
> > +
> > +             if (env->e_machine == EM_HOST)
> > +                     env->e_flags = EF_HOST;
> > +     }
> > +     if (e_flags)
> > +             *e_flags = env->e_flags;
> > +
> > +     return env->e_machine;
> >  }
> >
> >  const char *perf_env__arch(struct perf_env *env)
> >  {
> > -     char *arch_name;
> > +     if (!env)
> > +             return e_machine_to_perf_arch(EM_HOST);
> >
> > -     if (!env || !env->arch) { /* Assume local operation */
> > -             static struct utsname uts = { .machine[0] = '\0', };
> > -             if (uts.machine[0] == '\0' && uname(&uts) < 0)
> > -                     return NULL;
> > -             arch_name = uts.machine;
> > -     } else
> > -             arch_name = env->arch;
> > +     if (!env->arch) {
> > +             /*
> > +              * Lazily compute/allocate arch. The e_machine may have been
> > +              * read from a data file and so may not be EM_HOST.
> > +              */
> > +             uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
> >
> > -     return normalize_arch(arch_name);
> > +             env->arch = strdup(e_machine_to_perf_arch(e_machine));
> > +     }
> > +     return env->arch;
> >  }
> >
> >  #if defined(HAVE_LIBTRACEEVENT)
> > diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> > index a4501cbca375..91ff252712f4 100644
> > --- a/tools/perf/util/env.h
> > +++ b/tools/perf/util/env.h
> > @@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
> >
> >  void cpu_cache_level__free(struct cpu_cache_level *cache);
> >
> > +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
> >  const char *perf_env__arch(struct perf_env *env);
> >  const char *perf_env__arch_strerrno(struct perf_env *env, int err);
> >  const char *perf_env__cpuid(struct perf_env *env);
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 4b465abfa36c..dcc9bef303aa 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -2996,14 +2996,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
> >               return EM_HOST;
> >       }
> >
> > +     /* Is the env caching an e_machine? */
> >       env = perf_session__env(session);
> > -     if (env && env->e_machine != EM_NONE) {
> > -             if (e_flags)
> > -                     *e_flags = env->e_flags;
> > -
> > -             return env->e_machine;
> > -     }
> > +     if (env && env->e_machine != EM_NONE)
> > +             return perf_env__e_machine(env, e_flags);
> >
> > +     /*
> > +      * Compute from threads, note this is more accurate than
> > +      * perf_env__e_machine that falls back on EM_HOST and doesn't consider
> > +      * mixed 32-bit and 64-bit threads.
> > +      */
> >       machines__for_each_thread(&session->machines,
> >                                 perf_session__e_machine_cb,
> >                                 &args);
> > --
> > 2.53.0.1018.g2bb0e51243-goog
> >

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env
  2026-04-06  5:10               ` Namhyung Kim
@ 2026-04-06 16:11                 ` Ian Rogers
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  0 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-04-06 16:11 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Sun, Apr 5, 2026 at 10:10 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Mar 26, 2026 at 09:50:25PM -0700, Ian Rogers wrote:
> > Move the idle boolean to a helper symbol__is_idle function. In the
> > function lazily compute whether a symbol is an idle function taking
> > into consideration the kernel version and architecture of the
> > machine. As symbols__insert no longer needs to know if a symbol is for
> > the kernel, remove the argument.
> >
> > This change is inspired by mailing list discussion, particularly from
> > Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
> > <hca@linux.ibm.com>:
> > https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
> >
> > The change switches x86 matches to use strstarts which means
> > intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
> > change suggested by Honglei Wang <jameshongleiwang@126.com> in:
> > https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> [SNIP]
> > +     if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
> > +             int major = 0, minor = 0;
> > +             const char *release = env && env->os_release
> > +                     ? env->os_release : perf_version_string;
>
> I think Sashiko's review is right.  You need to check the kernel version
> instead of perf.

Doing this can create more problems and complexity than it solves. If
we state that `os_release` can be NULL at this point, we recompute it
using `uname`:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/header.c?h=perf-tools-next#n378
then do we cache the value in env? What happens if a data/pipe file
that assigns to the env later? Ad-hoc users of env->os_release
recomputing it shouldn't happen; instead, in 'live' mode, we should
assign os_release using uname either when the perf_env is created or
lazily with a helper function. I dislike that with a helper we could
potentially have multiple notions of os_release.

I'll add a patch to refactor the use of os_release, but can we be
mindful that this is clear feature creep with little benefit? We will
still fall back on `perf_version_string` if uname fails and for all
practical purposes, `perf_version_string` will differ little from
uname in this case. I'm only going to add the patch because checking
other uses of os_release suggests the change is benign.

Thanks,
Ian

> Thanks,
> Namhyung
>
> > +
> > +             /* Before v6.10, s390 used psw_idle. */
> > +             if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
> > +                 major < 6 || (major == 6 && minor < 10)) {
> > +                     sym->idle = SYMBOL_IDLE__IDLE;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     sym->idle = SYMBOL_IDLE__NOT_IDLE;
> > +     return false;
> >  }

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-06 16:11                 ` Ian Rogers
@ 2026-04-06 17:09                   ` Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                       ` (3 more replies)
  0 siblings, 4 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v5: Add perf_env os_release helper (Namhyung/Sashiko)

v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
    https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
    https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (3):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c          |   6 +-
 tools/perf/util/data-convert-bt.c |   2 +-
 tools/perf/util/env.c             | 206 ++++++++++++++++++++++++------
 tools/perf/util/env.h             |   2 +
 tools/perf/util/session.c         |  14 +-
 tools/perf/util/symbol-elf.c      |   2 +-
 tools/perf/util/symbol.c          | 107 ++++++++++------
 tools/perf/util/symbol.h          |  15 ++-
 8 files changed, 264 insertions(+), 90 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-04-06 17:09                     ` Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/session.c |  14 +--
 3 files changed, 157 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..339d62ca37bb 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch) {
+		/*
+		 * Lazily compute/allocate arch. The e_machine may have been
+		 * read from a data file and so may not be EM_HOST.
+		 */
+		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	return normalize_arch(arch_name);
+		env->arch = strdup(e_machine_to_perf_arch(e_machine));
+	}
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a4501cbca375..91ff252712f4 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -186,6 +186,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 3a911c70cd0e..070dd78772f2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3009,14 +3009,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-06 17:09                     ` Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/symbol.c          |  4 ++--
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index bece77cbc493..bc5805183100 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 339d62ca37bb..34b737950f73 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 91ff252712f4..bf30a02dccf7 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -174,6 +174,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index b4b30675688d..ea7d2f2dbcb7 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2208,7 +2208,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2225,7 +2225,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-04-06 17:09                     ` [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-04-06 17:09                     ` Ian Rogers
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-06 17:09 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk, tmricht

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 103 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37950efb28ac..bdc1c761cd61 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ea7d2f2dbcb7..8c23802b39ad 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -357,8 +358,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -366,17 +366,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -393,7 +382,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -554,7 +543,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -716,47 +705,85 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (sscanf(release, "%d.%d", &major, &minor) != 2 ||
+		    major < 6 || (major == 6 && minor < 10)) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -785,7 +812,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c67814d6d6d6..2f5f90f547aa 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -25,6 +25,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -42,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -57,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -202,8 +209,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -286,5 +292,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                       ` (2 preceding siblings ...)
  2026-04-06 17:09                     ` [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-04-09 23:06                     ` Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                         ` (2 more replies)
  3 siblings, 3 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

v6: Ensure arch is canonical by going to e_machine and back (Sashiko)

v5: Add perf_env os_release helper (Namhyung/Sashiko)
    https://lore.kernel.org/lkml/20260406170905.2614260-1-irogers@google.com/

v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
    https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/

v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
    https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/

v2: Some minor white space clean up:
    https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (3):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Lazily compute idle and use the perf_env

 tools/perf/builtin-top.c          |   6 +-
 tools/perf/util/data-convert-bt.c |   2 +-
 tools/perf/util/env.c             | 206 ++++++++++++++++++++++++------
 tools/perf/util/env.h             |   2 +
 tools/perf/util/header.c          |  60 ++++++---
 tools/perf/util/session.c         |  14 +-
 tools/perf/util/symbol-elf.c      |   2 +-
 tools/perf/util/symbol.c          | 107 ++++++++++------
 tools/perf/util/symbol.h          |  15 ++-
 9 files changed, 309 insertions(+), 105 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-04-09 23:06                       ` Ian Rogers
  2026-04-09 23:37                         ` sashiko-bot
                                           ` (2 more replies)
  2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 3 replies; 106+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 185 ++++++++++++++++++++++++++++++--------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/header.c  |  44 ++++++---
 tools/perf/util/session.c |  14 +--
 4 files changed, 191 insertions(+), 53 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..339d62ca37bb 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, bool is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/* Handle conflicting prefixes. */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return is_64_bit || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return is_64_bit || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return is_64_bit || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env->arch) {
+		/*
+		 * Lazily compute/allocate arch. The e_machine may have been
+		 * read from a data file and so may not be EM_HOST.
+		 */
+		uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	return normalize_arch(arch_name);
+		env->arch = strdup(e_machine_to_perf_arch(e_machine));
+	}
+	return env->arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..d36a0fb2cd04 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c6efddb70aee..9bb4a271b4f8 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,21 +370,25 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		arch = perf_env__arch(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.machine);
+	if (!arch) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2675,10 +2679,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	uint16_t saved_e_machine = ff->ph->env.e_machine;
+
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	/*
+	 * Make the arch string canonical by computing the e_machine from it,
+	 * then turning the e_machine back into an arch string.
+	 */
+	ff->ph->env.e_machine = EM_NONE;
+	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
+		zfree(&ff->ph->env.arch);
+		perf_env__arch(&ff->ph->env);
+	}
+	ff->ph->env.e_machine = saved_e_machine;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..726568b88803 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/* Is the env caching an e_machine? */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-09 23:06                       ` Ian Rogers
  2026-04-09 23:50                         ` sashiko-bot
  2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
  2 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 339d62ca37bb..34b737950f73 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d36a0fb2cd04..56020f4381cd 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -175,6 +175,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 9bb4a271b4f8..89115134f1d2 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -361,13 +361,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..fd332db56157 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2209,7 +2209,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env
  2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-04-09 23:06                       ` Ian Rogers
  2026-04-10  0:11                         ` sashiko-bot
  2 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-04-09 23:06 UTC (permalink / raw)
  To: namhyung
  Cc: irogers, acme, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk, tmricht

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   6 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 103 ++++++++++++++++++++++-------------
 tools/perf/util/symbol.h     |  15 +++--
 4 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f6eb543de537..95fa3a03e62d 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -751,6 +751,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct dso *dso = NULL;
 
 	if (!machine && perf_guest) {
 		static struct intlist *seen;
@@ -830,7 +831,10 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.map)
+		dso = map__dso(al.map);
+
+	if (al.sym == NULL || !symbol__is_idle(al.sym, dso, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fd332db56157..482fd47bead2 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -25,6 +25,8 @@
 #include "demangle-ocaml.h"
 #include "demangle-rust-v0.h"
 #include "dso.h"
+#include "dwarf-regs.h"
+#include "env.h"
 #include "util.h" // lsdir()
 #include "event.h"
 #include "machine.h"
@@ -50,7 +52,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -358,8 +359,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -367,17 +367,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -394,7 +383,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -555,7 +544,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -717,47 +706,85 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		sym->idle = SYMBOL_IDLE__NOT_IDLE;
+		return false;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		sym->idle = SYMBOL_IDLE__IDLE;
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			sym->idle = SYMBOL_IDLE__IDLE;
+			return true;
+		}
+	}
+
+	sym->idle = SYMBOL_IDLE__NOT_IDLE;
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -786,7 +813,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..7e0036f80185 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -26,6 +26,7 @@ struct dso;
 struct map;
 struct maps;
 struct option;
+struct perf_env;
 struct build_id;
 
 /*
@@ -43,6 +44,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -58,8 +65,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -194,8 +201,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -278,5 +284,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-04-09 23:37                         ` sashiko-bot
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-04-09 23:37 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Sashiko AI review found 4 potential issue(s):
- [High] perf_session__e_machine() bypasses lazy evaluation of e_machine from env->arch when env->e_machine is EM_NONE, breaking cross-architecture analysis of legacy perf.data files.
- [High] Uninitialized env->kernel_is_64_bit coerces to true, causing 32-bit architectures to be parsed as 64-bit.
- [High] write_arch() silently caches EM_HOST, permanently bypassing accurate thread-based architecture detection during perf record.
- [Medium] Unhandled OOM in process_arch() leaves env->arch as NULL, leading to a subsequent NULL pointer dereference.
--

commit a2ed4a04364ee664047eec1e47e4e540e458eff9
Author: Ian Rogers <irogers@google.com>

perf env: Add perf_env__e_machine helper and use in perf_env__arch

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules.

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 1e54e2c863605..339d62ca37bb8 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
[ ... ]
> @@ -588,51 +590,160 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
[ ... ]
> +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
> +{
> +	if (!env) {
> +		if (e_flags)
> +			*e_flags = EF_HOST;
> +
> +		return EM_HOST;
> +	}
> +	if (env->e_machine == EM_NONE) {
> +		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);

Is env->kernel_is_64_bit guaranteed to be a boolean here?

If env->kernel_is_64_bit is uninitialized, it defaults to -1 in
perf_env__init(). In C, -1 evaluates as true, which would cause 32-bit
architectures to be parsed as 64-bit (e.g., returning EM_X86_64 instead
of EM_386).

Should this use the lazy evaluator perf_env__kernel_is_64_bit(env) instead?

[ ... ]
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index c6efddb70aee2..9bb4a271b4f8a 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -370,21 +370,25 @@ static int write_osrelease(struct feat_fd *ff,
[ ... ]
> -static int write_arch(struct feat_fd *ff,
> -		      struct evlist *evlist __maybe_unused)
> +static int write_arch(struct feat_fd *ff, struct evlist *evlist)
>  {
>  	struct utsname uts;
> -	int ret;
> +	const char *arch = NULL;
>  
> -	ret = uname(&uts);
> -	if (ret < 0)
> -		return -1;
> +	if (evlist->session)
> +		arch = perf_env__arch(perf_session__env(evlist->session));

When write_arch() is called during perf record, it invokes perf_env__arch().
Since env->arch is NULL and env->e_machine is EM_NONE, this falls back to
perf_arch_to_e_machine(NULL, ...), locking env->e_machine to EM_HOST.

Later, when write_e_machine() calls perf_session__e_machine(), the check for
env->e_machine != EM_NONE will evaluate to true.

Will this permanently bypass the accurate thread-based architecture detection
(machines__for_each_thread) designed to handle mixed 32/64-bit threads?

[ ... ]
> @@ -2675,10 +2679,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
[ ... ]
> +static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
> +{
> +	uint16_t saved_e_machine = ff->ph->env.e_machine;
> +
> +	free(ff->ph->env.arch);
> +	ff->ph->env.arch = do_read_string(ff);
> +	if (!ff->ph->env.arch)
> +		return -ENOMEM;
> +	/*
> +	 * Make the arch string canonical by computing the e_machine from it,
> +	 * then turning the e_machine back into an arch string.
> +	 */
> +	ff->ph->env.e_machine = EM_NONE;
> +	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
> +		zfree(&ff->ph->env.arch);
> +		perf_env__arch(&ff->ph->env);

If the strdup() inside perf_env__arch() fails due to memory constraints,
it returns NULL and env->arch remains NULL.

Since process_arch() unconditionally returns 0 here, could this lead to a
NULL pointer dereference later if strncmp() is called on the architecture
string?

> +	}
> +	ff->ph->env.e_machine = saved_e_machine;
> +	return 0;
> +}
> +
>  static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
>  {
>  	int ret;
[ ... ]
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index fe0de2a0277f0..726568b888039 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -3023,14 +3023,16 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
>  		return EM_HOST;
>  	}
>  
> +	/* Is the env caching an e_machine? */
>  	env = perf_session__env(session);
> -	if (env && env->e_machine != EM_NONE) {
> -		if (e_flags)
> -			*e_flags = env->e_flags;
> -
> -		return env->e_machine;
> -	}
> +	if (env && env->e_machine != EM_NONE)
> +		return perf_env__e_machine(env, e_flags);

For legacy perf.data files without HEADER_E_MACHINE, process_arch() restores
env->e_machine to EM_NONE.

Since the condition here explicitly requires env->e_machine != EM_NONE, won't
this completely bypass the new lazy evaluator and fall back to EM_HOST,
potentially breaking cross-architecture analysis for legacy files?

>  
> +	/*
> +	 * Compute from threads, note this is more accurate than
> +	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
> +	 * mixed 32-bit and 64-bit threads.
> +	 */
>  	machines__for_each_thread(&session->machines,
>  				  perf_session__e_machine_cb,
>  				  &args);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260409230620.4176210-1-irogers@google.com?part=1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release
  2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-04-09 23:50                         ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-04-09 23:50 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Sashiko AI review found 2 potential issue(s):
- [Medium] Offline traces missing the OS_RELEASE header are polluted with the host's kernel version.
- [Medium] Unhandled `strdup()` memory allocation failure leads to NULL pointer dereference in `snprintf()`.
--

commit 291bf2cc1c91c985c162c953f24ca34e304792c1
Author: Ian Rogers <irogers@google.com>

perf env: Add helper to lazily compute the os_release

This commit adds a lazy initialization helper for the os_release string in
the perf environment, falling back to uname if it is not initialized.

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
[ ... ]
> @@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
>  	return env->kernel_is_64_bit;
>  }
>  
> +const char *perf_env__os_release(struct perf_env *env)
> +{
> +	struct utsname uts;
> +	int ret;
> +
> +	if (!env)
> +		return perf_version_string;
> +
> +	if (env->os_release)
> +		return env->os_release;
> +
> +	/*
> +	 * The os_release is being accessed but wasn't initialized from a data
> +	 * file, assume this is 'live' mode and use the release from uname. If
> +	 * uname fails then use the current perf tool version.
> +	 */
> +	ret = uname(&uts);
> +	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
> +	return env->os_release;
> +}

If we are processing an offline perf.data file with a tool like perf report
or perf data convert, and the file is missing the HEADER_OSRELEASE feature,
env->os_release will legitimately be NULL.

Does this unconditional fallback to uname risk polluting the offline
trace's metadata with the local analysis host's kernel version?

This could potentially cause tools to incorrectly search for the target
vmlinux using the host's kernel version, or embed the host's release into
exported traces as the target's release.

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
[ ... ]
> @@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
>  		return 0;
>  
>  	if (env) {
> -		kernel_version = env->os_release;
> +		kernel_version = perf_env__os_release(env);

If the strdup call in perf_env__os_release fails, it will return NULL.

Since kernel_version is assigned the result directly, could this result in
passing a NULL pointer to snprintf slightly further down in this function:

	snprintf(bf, sizeof(bf), vmlinux_paths_upd[i], kernel_version);

Passing NULL to a %s format specifier can lead to undefined behavior and
potential segmentation faults on certain C libraries like musl.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260409230620.4176210-1-irogers@google.com?part=2

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env
  2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
@ 2026-04-10  0:11                         ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-04-10  0:11 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Sashiko AI review found 1 potential issue(s):
- [High] Data race and undefined behavior due to unsynchronized bitfield updates in `struct symbol`.
--

commit 2ee008011b6949869303359cb5ce16b0c47431af
Author: Ian Rogers <irogers@google.com>
Subject: perf symbol: Lazily compute idle and use the perf_env

This commit defers computing whether a symbol is an idle function to a lazy
evaluation using symbol__is_idle. It modifies struct symbol to cache this
lazily computed value using an idle:2 bitfield, checking architectural and
kernel versions using the perf_env.

Link: https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
Link: https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index bd6eb90c86683..7e0036f801858 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
[ ... ]
> @@ -58,8 +65,8 @@ struct symbol {
>  	u8		type:4;
>  	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>  	u8		binding:4;
> -	/** Set true for kernel symbols of idle routines. */
> -	u8		idle:1;
> +	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
> +	u8		idle:2;
>  	/** Resolvable but tools ignore it (e.g. idle routines). */
>  	u8		ignore:1;
>  	/** Symbol for an inlined function. */

Could this lazy update of the idle bitfield cause a data race?

Because idle is a bitfield sharing a byte with ignore, inlined, annotate2,
and ifunc_alias, writing to it performs a non-atomic read-modify-write on
the entire byte.

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index fd332db561575..482fd47bead24 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
[ ... ]
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
>  {
[ ... ]
> +	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
> +		    sizeof(idle_symbols[0]), sym_name_cmp)) {
> +		sym->idle = SYMBOL_IDLE__IDLE;
> +		return true;
> +	}

In perf top, sample processing runs continuously in a background thread,
calling symbol__is_idle() and writing to sym->idle:

Thread 1 (Sample processing):
    sym->idle = SYMBOL_IDLE__IDLE;

Concurrently, the TUI thread processes user interactions and can set flags
like annotate2:

Thread 2 (TUI / User Interaction):
    sym->annotate2 = 1;

If these threads access the same symbol simultaneously, could one thread's
write overwrite the other's modifications, silently corrupting the symbol
state and causing features like UI annotation to break?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260409230620.4176210-1-irogers@google.com?part=3

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-04-09 23:37                         ` sashiko-bot
@ 2026-05-01 18:20                         ` Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                             ` (3 more replies)
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2 siblings, 4 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is
EM_NONE. Derive the value from the arch string if available. Similarly
derive the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.
  
Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
perf top. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.
  
To avoid concurrent update issues with bitfields sharing a byte in
struct symbol due to the lazy computation, introduce a global lock for
updates to these fields and use setter functions. The reads remain
lockless.
  
v7:
 - Address better handling of strdup failures with arch in the header/env.
 - Address concurrent update issues in  struct symbol  bitfields by
   introducing a global lock for writes.
  
v6: Ensure arch is canonical by going to e_machine and back (Sashiko)
https://lore.kernel.org/linux-perf-users/20260409230620.4176210-1-irogers@google.com/

v5: Add perf_env os_release helper (Namhyung/Sashiko)
https://lore.kernel.org/lkml/20260406170905.2614260-1-irogers@google.com/
  
v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/
  
v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/
  
v2: Some minor white space clean up:
https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/
  
v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (4):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Add setters for bitfields sharing a byte to avoid
    concurrent update issues
  perf symbol: Lazily compute idle and use a global lock for updates

 tools/perf/builtin-kwork.c        |   2 +-
 tools/perf/builtin-sched.c        |   2 +-
 tools/perf/util/annotate.c        |   2 +-
 tools/perf/util/data-convert-bt.c |   2 +-
 tools/perf/util/env.c             | 218 +++++++++++++++++++++++++-----
 tools/perf/util/env.h             |   2 +
 tools/perf/util/header.c          |  63 +++++++--
 tools/perf/util/session.c         |  25 ++--
 tools/perf/util/symbol-elf.c      |   2 +-
 tools/perf/util/symbol.c          | 134 ++++++++++++------
 tools/perf/util/symbol.h          |  17 ++-
 11 files changed, 357 insertions(+), 112 deletions(-)

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 18:56                             ` sashiko-bot
  2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 197 +++++++++++++++++++++++++++++++-------
 tools/perf/util/env.h     |   1 +
 tools/perf/util/header.c  |  47 +++++++--
 tools/perf/util/session.c |  25 ++---
 4 files changed, 212 insertions(+), 58 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..1671769d4441 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -588,51 +590,172 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, int is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/*
+	 * Handle conflicting prefixes. If the is_64_bit is unknown (-1) then
+	 * assume 64-bit. We can't use perf_env__kernel_is_64_bit as that
+	 * depends on the arch string.
+	 */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return (is_64_bit != 0) || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	if (!env) {
+		if (e_flags)
+			*e_flags = EF_HOST;
+
+		return EM_HOST;
+	}
+	if (env->e_machine == EM_NONE) {
+		env->e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+		if (env->e_machine == EM_HOST)
+			env->e_flags = EF_HOST;
+	}
+	if (e_flags)
+		*e_flags = env->e_flags;
+
+	return env->e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	uint16_t e_machine;
+	const char *arch;
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
+
+	if (env->arch)
+		return env->arch;
 
-	return normalize_arch(arch_name);
+	/*
+	 * Lazily compute/allocate arch. The e_machine may have been
+	 * read from a data file and so may not be EM_HOST.
+	 */
+	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	arch = e_machine_to_perf_arch(e_machine);
+	env->arch = strdup(arch);
+	/*
+	 * Avoid potential crashes on the arch string if memory allocation in
+	 * strdup fails and NULL were to be returned.
+	 */
+	return env->arch ?: arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..d36a0fb2cd04 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f30e48eb3fc3..8d5152bde25d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -379,21 +379,28 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session) {
+		/* Force the computation in the perf_env of the e_machine of the threads. */
+		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
+		arch = perf_env__arch(perf_session__env(evlist->session));
+	}
+
+	if (!arch) {
+		int ret = uname(&uts);
 
-	return do_write_string(ff, uts.machine);
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2684,10 +2691,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	uint16_t saved_e_machine = ff->ph->env.e_machine;
+
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	/*
+	 * Make the arch string canonical by computing the e_machine from it,
+	 * then turning the e_machine back into an arch string.
+	 */
+	ff->ph->env.e_machine = EM_NONE;
+	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
+		zfree(&ff->ph->env.arch);
+		perf_env__arch(&ff->ph->env);
+	}
+	ff->ph->env.e_machine = saved_e_machine;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..bc7add02a2de 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,19 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/*
+	 * Is the env caching an e_machine? If not we want to compute from the
+	 * more accurate threads.
+	 */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
@@ -3048,10 +3053,8 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 
 	/*
 	 * Couldn't determine from the perf_env or current set of
-	 * threads. Default to the host.
+	 * threads. Potentially use logic that uses the arch string otherwise
+	 * default to the host.
 	 */
-	if (e_flags)
-		*e_flags = EF_HOST;
-
-	return EM_HOST;
+	return perf_env__e_machine(env, e_flags);
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 19:20                             ` sashiko-bot
  2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  3 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1671769d4441..c3e464c6de2f 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname or strdup fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release ?: perf_version_string;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d36a0fb2cd04..56020f4381cd 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -175,6 +175,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 8d5152bde25d..cfafed3cc69f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,13 +370,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..fd332db56157 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2209,7 +2209,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 19:42                             ` sashiko-bot
  2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  3 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

A problem with putting bitfields into struct symbol is that other bits in
the symbol could be updated concurrently and only one update to the
underlying storage unit happen, leading to lost updates.

To avoid this, introduce a global lock `symbol_bits_lock` in `symbol.c`
and helper functions to update the bits sharing a byte:
`symbol__set_ignore` and `symbol__set_annotate2`.

`inlined` is not given a setter as it is only initialized in
`new_inline_sym` when the symbol is under construction and not shared.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-kwork.c |  2 +-
 tools/perf/builtin-sched.c |  2 +-
 tools/perf/util/annotate.c |  2 +-
 tools/perf/util/symbol.c   | 22 ++++++++++++++++++++++
 tools/perf/util/symbol.h   |  3 +++
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
index 9d3a4c779a41..7337ee956dc9 100644
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@@ -725,7 +725,7 @@ static void timehist_save_callchain(struct perf_kwork *kwork,
 		if (sym) {
 			if (!strcmp(sym->name, "__softirqentry_text_start") ||
 			    !strcmp(sym->name, "__do_softirq"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 555247568e7a..655e95f660c2 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2371,7 +2371,7 @@ static void save_task_callchain(struct perf_sched *sched,
 			if (!strcmp(sym->name, "schedule") ||
 			    !strcmp(sym->name, "__schedule") ||
 			    !strcmp(sym->name, "preempt_schedule"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..d550a0061159 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2224,7 +2224,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 
 	annotation__init_column_widths(notes, sym);
 	annotation__update_column_widths(notes);
-	sym->annotate2 = 1;
+	symbol__set_annotate2(sym, true);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fd332db56157..e6a1f23634ec 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -31,6 +31,7 @@
 #include "map.h"
 #include "symbol.h"
 #include "map_symbol.h"
+#include "mutex.h"
 #include "mem-events.h"
 #include "mem-info.h"
 #include "symsrc.h"
@@ -52,6 +53,8 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
 static bool symbol__is_idle(const char *name);
 
+static struct mutex symbol_bits_lock;
+
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
@@ -345,6 +348,20 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
+void symbol__set_ignore(struct symbol *sym, bool ignore)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->ignore = ignore;
+	mutex_unlock(&symbol_bits_lock);
+}
+
+void symbol__set_annotate2(struct symbol *sym, bool annotate2)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->annotate2 = annotate2;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -2398,6 +2415,8 @@ int symbol__init(struct perf_env *env)
 	if (symbol_conf.initialized)
 		return 0;
 
+	mutex_init(&symbol_bits_lock);
+
 	symbol_conf.priv_size = PERF_ALIGN(symbol_conf.priv_size, sizeof(u64));
 
 	symbol__elf_init();
@@ -2476,6 +2495,9 @@ void symbol__exit(void)
 {
 	if (!symbol_conf.initialized)
 		return;
+
+	mutex_destroy(&symbol_bits_lock);
+
 	strlist__delete(symbol_conf.bt_stop_list);
 	strlist__delete(symbol_conf.sym_list);
 	strlist__delete(symbol_conf.dso_list);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..5d98d7e84d57 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -77,6 +77,9 @@ struct symbol {
 void symbol__delete(struct symbol *sym);
 void symbols__delete(struct rb_root_cached *symbols);
 
+void symbol__set_ignore(struct symbol *sym, bool ignore);
+void symbol__set_annotate2(struct symbol *sym, bool annotate2);
+
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
  * @symbols: the rb_root of symbols
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (2 preceding siblings ...)
  2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-01 18:20                           ` Ian Rogers
  2026-05-01 20:13                             ` sashiko-bot
  3 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-01 18:20 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

To avoid concurrent update issues with other bitfields in `struct symbol`,
this change uses the global lock `symbol_bits_lock` (introduced in a
previous commit) for updates to the `idle` field. A static helper
`symbol__set_idle` taking a boolean is used to encapsulate the lock and
mapping to `enum symbol_idle_kind`.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 108 +++++++++++++++++++++++------------
 tools/perf/util/symbol.h     |  14 +++--
 3 files changed, 81 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index e6a1f23634ec..8ec4b2836b44 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -51,7 +51,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 static struct mutex symbol_bits_lock;
 
@@ -362,6 +361,13 @@ void symbol__set_annotate2(struct symbol *sym, bool annotate2)
 	mutex_unlock(&symbol_bits_lock);
 }
 
+static void symbol__set_idle(struct symbol *sym, bool idle)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->idle = idle ? SYMBOL_IDLE__IDLE : SYMBOL_IDLE__NOT_IDLE;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -375,8 +381,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -384,17 +389,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -411,7 +405,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -572,7 +566,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -738,43 +732,81 @@ int modules__parse(const char *filename, void *arg,
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+static int sym_name_cmp(const void *a, const void *b)
 {
-	const char * const idle_symbols[] = {
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
+{
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		symbol__set_idle(sym, /*idle=*/false);
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
+
+	symbol__set_idle(sym, /*idle=*/false);
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -803,7 +835,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 5d98d7e84d57..717d2f876d58 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -43,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -58,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -197,8 +203,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -281,5 +286,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-01 18:56                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-01 18:56 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Premature caching of fallback e_machine defeats dynamic thread-based architecture detection.
- [High] Canonicalization of `env->arch` breaks `perf_env__kernel_is_64_bit()`, falsely treating 64-bit traces as 32-bit.
- [High] Older 32-bit perf.data files are falsely promoted to 64-bit due to uninitialized `kernel_is_64_bit` state.
- [High] Inconsistent caching causes `perf_env__arch()` to unpredictably return unnormalized or canonical strings depending on initialization order.
- [High] Changing the format of the `HEADER_ARCH` string written to `perf.data` files breaks backward compatibility and existing tests.
- [Medium] OOM during string allocation in `process_arch()` causes a silent NULL pointer dereference in downstream consumers.
--

commit 3dd924151546a227a61b859787ac74725a289795
Author: Ian Rogers <irogers@google.com>

perf env: Add perf_env__e_machine helper and use in perf_env__arch

Add a helper that lazily computes the e_machine and falls back of
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 1e54e2c863605..1671769d44411 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c

[ ... ]

> @@ -627,6 +629,40 @@ static uint16_t perf_arch_to_e_machine(const char *perf_arch, int is_64_bit)

[ ... ]

> +	switch (result->e_machine) {
> +	case EM_ARM:
> +		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
> +	case EM_AVR:
> +		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
> +	case EM_PPC:
> +		return (is_64_bit != 0) || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
> +	case EM_SPARC:
> +		return (is_64_bit != 0) || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;

Does this logic introduce a regression that inadvertently promotes older 32-bit
perf.data files to 64-bit?

When process_arch() reads an older data file, it calls perf_env__e_machine()
which invokes this function. At that point, env->kernel_is_64_bit is still
uninitialized, holding a value of -1.

Since -1 != 0 evaluates to true, would this unconditionally return EM_PPC64
and EM_SPARCV9 for older 32-bit ppc and sparc traces?

[ ... ]

> @@ -685,15 +688,20 @@ const char *perf_env__arch(struct perf_env *env)

[ ... ]

> +	/*
> +	 * Lazily compute/allocate arch. The e_machine may have been
> +	 * read from a data file and so may not be EM_HOST.
> +	 */
> +	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
> +	arch = e_machine_to_perf_arch(e_machine);
> +	env->arch = strdup(arch);

Could this lead to a regression with inconsistent behavior depending on
initialization order?

If perf_env__read_arch() runs first, it caches the raw uname string
(like x86_64). If perf_env__arch() runs first, it calculates and caches the
canonical string (like x86).

Will this unpredictability break strict string comparisons against env->arch
elsewhere in the codebase?

> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index f30e48eb3fc32..8d5152bde25d5 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c

[ ... ]

> @@ -379,21 +379,28 @@ static int write_arch(struct feat_fd *ff, struct evlist *evlist)
>  {
>  	struct utsname uts;
> -	int ret;
> +	const char *arch = NULL;
>  
> -	ret = uname(&uts);
> -	if (ret < 0)
> -		return -1;
> +	if (evlist->session) {
> +		/* Force the computation in the perf_env of the e_machine of the threads. */
> +		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
> +		arch = perf_env__arch(perf_session__env(evlist->session));
> +	}

Does writing the canonicalized string here cause a regression by breaking
backward compatibility with existing tests?

By calling perf_env__arch(), this writes strings like "powerpc" into the
HEADER_ARCH section of the data file, rather than the raw uts.machine
strings like "ppc64le".

Tests such as test__session_topology parse HEADER_ARCH and conditionally
skip topology validations using checks like strncmp(env->arch, "ppc64le", 7).
Will this change cause those validations to run on unsupported platforms
and fail?

[ ... ]

> @@ -2684,10 +2691,30 @@ static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
> +{
> +	uint16_t saved_e_machine = ff->ph->env.e_machine;
> +
> +	free(ff->ph->env.arch);
> +	ff->ph->env.arch = do_read_string(ff);
> +	if (!ff->ph->env.arch)
> +		return -ENOMEM;
> +	/*
> +	 * Make the arch string canonical by computing the e_machine from it,
> +	 * then turning the e_machine back into an arch string.
> +	 */
> +	ff->ph->env.e_machine = EM_NONE;
> +	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
> +		zfree(&ff->ph->env.arch);
> +		perf_env__arch(&ff->ph->env);
> +	}

Will replacing env->arch with the canonicalized string introduce a regression
that breaks 64-bit architecture detection?

The function perf_env__init_kernel_mode() expects env->arch to contain the
raw uname string, matching exactly against strings like "x86_64" or "s390x".
Since env->arch now contains the canonical string (e.g., "x86"), won't
env->kernel_is_64_bit erroneously evaluate to 0 for 64-bit traces?

Also, is it possible for env->arch to be left as NULL here, leading to a
regression where downstream consumers crash?

If the strdup() inside perf_env__arch() fails due to memory pressure,
perf_env__arch() will safely return a static string, but env->arch will
remain NULL.

Since process_arch() ignores the return value of perf_env__arch(), will
consumers crash when attempting to use env->arch directly?

> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index fe0de2a0277f0..bc7add02a2de4 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c

[ ... ]

> @@ -3048,10 +3053,8 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
>  
>  	/*
>  	 * Couldn't determine from the perf_env or current set of
> -	 * threads. Default to the host.
> +	 * threads. Potentially use logic that uses the arch string otherwise
> +	 * default to the host.
>  	 */
> -	if (e_flags)
> -		*e_flags = EF_HOST;
> -
> -	return EM_HOST;
> +	return perf_env__e_machine(env, e_flags);
>  }

Can this lead to a regression where the session architecture gets
permanently locked to the host architecture?

If perf_session__e_machine() is called early (for example, from
write_e_machine() before threads are synthesized),
machines__for_each_thread() will find no threads.

The function will then fall back to perf_env__e_machine() here, which
computes and permanently caches EM_HOST into env->e_machine.

Because env->e_machine is now cached and != EM_NONE, all future calls
will take the fast-path at the top of this function, completely bypassing
the dynamic thread scanning logic.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260501182021.3651851-1-irogers@google.com?part=1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release
  2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-01 19:20                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-01 19:20 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The `perf_env__os_release()` helper corrupts offline trace metadata by incorrectly assuming an uninitialized `os_release` implies 'live mode', lazily populating it with the analysis host's kernel version.
- [Medium] Using `perf_version_string` as a fallback for the OS release is semantically incorrect and pollutes trace headers.
--

commit 9cca0f2ccb3995708fd2106d23dda3d1c07d1fd9
Author: Ian Rogers <irogers@google.com>

perf env: Add helper to lazily compute the os_release

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 1671769d44411..c3e464c6de2f1 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c

[ ... ]

> @@ -330,6 +330,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
>  	return env->kernel_is_64_bit;
>  }
>  
> +const char *perf_env__os_release(struct perf_env *env)
> +{
> +	struct utsname uts;
> +	int ret;
> +
> +	if (!env)
> +		return perf_version_string;
> +
> +	if (env->os_release)
> +		return env->os_release;
> +
> +	/*
> +	 * The os_release is being accessed but wasn't initialized from a data
> +	 * file, assume this is 'live' mode and use the release from uname. If
> +	 * uname or strdup fails then use the current perf tool version.
> +	 */
> +	ret = uname(&uts);

If we are doing offline analysis of a perf.data file that lacks the OS
release feature header, env->os_release will be NULL.

Does this helper incorrectly assume we are in live mode, and lazily
populate env->os_release with the analysis host's kernel version?

If so, it appears this causes tools to use the wrong vmlinux path for the
target profile in vmlinux_path__init() and potentially break symbol
resolution.

It also looks like this causes ctf_writer__setup_env() to export the host
kernel version into the CTF trace instead of safely skipping the missing
field as it did previously.

> +	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
> +	return env->os_release ?: perf_version_string;
> +}

Is it appropriate to use the perf tool version (perf_version_string) as a
fallback for the OS release if uname() or strdup() fails?

This string could end up permanently written into the perf.data file's OS
release section, or cause bogus filesystem lookups in vmlinux_path__init()
since the tool version format differs from the kernel release.

Would it be safer to return NULL or a generic unknown string here instead?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260501182021.3651851-1-irogers@google.com?part=2

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-01 19:42                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-01 19:42 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Data Race / UB on concurrent reads of bitfields
- [High] Architectural flaw and severe performance regression due to unconditional global lock
- [High] Subsystem rule violation: Event handler completeness missing in builtin-sched.c
- [Medium] Unbalanced lifecycle and resource leak on error paths in symbol__init()
--

commit eaf96d025426eab89c805716fc26dd2399ff6394
Author: Ian Rogers <irogers@google.com>

perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues

This commit introduces a global mutex to serialize modifications to bitfields
within the symbol structure. Helper functions are provided to update these
fields safely across concurrent threads.

> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 555247568e7a6..655e95f660c2f 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c

This isn't a bug in the current changes, but while reviewing builtin-sched.c,
it appears the sched tool registers the .mmap handler but misses .mmap2.

When a subcommand registers .mmap but not .mmap2, modern kernels generating
mmap2 events will have those events silently dropped. Is it possible this leaves
the machine vmlinux maps and process memory maps incomplete or null?

[ ... ]

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index fd332db561575..e6a1f23634ec3 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
[ ... ]
> @@ -345,6 +348,20 @@ void symbol__delete(struct symbol *sym)
>  	free(((void *)sym) - symbol_conf.priv_size);
>  }
>  
> +void symbol__set_ignore(struct symbol *sym, bool ignore)
> +{
> +	mutex_lock(&symbol_bits_lock);
> +	sym->ignore = ignore;
> +	mutex_unlock(&symbol_bits_lock);
> +}

Are concurrent reads left unprotected by this implementation?

While the lock protects the read-modify-write cycle of the bitfield updates
here, existing readers access these fields without holding the lock.
For example:

builtin-sched.c:save_task_callchain() {
    ...
    if (sym->ignore)
        ...
}

Under the C11 memory model, reading a bitfield while another thread
concurrently modifies an adjacent bitfield in the same byte constitutes
a data race. Could this result in torn reads and undefined behavior?

Additionally, is a global lock necessary for these updates?

Unconditional acquisition of a global lock on hot paths like
timehist_save_callchain() and save_task_callchain() might cause a performance
regression, especially since it is acquired even if the flag is already true.

On 64-bit systems, struct symbol has 3 bytes of internal padding before
the dso pointer. Converting the bitfields to separate bool or u8 fields
would allow them to occupy this padding. This layout would ensure each field
resides in a separate byte, allowing hardware byte-level atomicity to
eliminate false sharing and the need for locking entirely.

[ ... ]
> @@ -2398,6 +2415,8 @@ int symbol__init(struct perf_env *env)
>  	if (symbol_conf.initialized)
>  		return 0;
>  
> +	mutex_init(&symbol_bits_lock);
> +
>  	symbol_conf.priv_size = PERF_ALIGN(symbol_conf.priv_size, sizeof(u64));

Could this skip destroying the mutex on error paths?

If setup_list() or other initializations fail later in symbol__init(),
the function returns -1 or jumps to error labels without calling
mutex_destroy(). If the caller retries initialization, calling mutex_init()
on an already-initialized mutex can result in undefined behavior.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260501182021.3651851-1-irogers@google.com?part=3

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates
  2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
@ 2026-05-01 20:13                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-01 20:13 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The new lazy evaluation helper `symbol__is_idle()` is never called, and the migration to a tri-state enum breaks existing boolean consumers of `sym->idle`.
- [High] Data race on lockless read of the `sym->idle` bitfield in `symbol__is_idle()`.
--

commit 2ef259ec3239aea58dbf7c1d14b3998396292f27
Author: Ian Rogers <irogers@google.com>

perf symbol: Lazily compute idle and use a global lock for updates

This patch moves the idle boolean computation into a lazy helper,
symbol__is_idle(), considering kernel version and architecture. It also
changes the idle field to a tri-state enum and protects updates with a
global lock.

Link: https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
Link: https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index e6a1f23634ec..8ec4b2836b44 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
[ ... ]
> @@ -384,17 +389,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
>  	struct symbol *s;
>  	bool leftmost = true;
>  
> -	if (kernel) {
> -		const char *name = sym->name;
> -		/*
> -		 * ppc64 uses function descriptors and appends a '.' to the
> -		 * start of every instruction address. Remove it.
> -		 */
> -		if (name[0] == '.')
> -			name++;
> -		sym->idle = symbol__is_idle(name);
> -	}

Is the new symbol__is_idle() helper actually called anywhere to lazily
initialize this field?

It looks like the call was removed here, but the new function doesn't appear
to be invoked anywhere else by the consumers.

[ ... ]
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
> +{
> +	static const char * const idle_symbols[] = {
[ ... ]
> +	};
> +	const char *name = sym->name;
> +	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
>  
> -	if (idle_symbols_list)
> -		return strlist__has_entry(idle_symbols_list, name);
> +	if (sym->idle)
> +		return sym->idle == SYMBOL_IDLE__IDLE;

Does reading the sym->idle bitfield here without holding symbol_bits_lock
cause a data race?

Since adjacent bitfields in struct symbol share the same memory location,
a concurrent update to them performs a read-modify-write cycle that could
race with this lockless read, potentially resulting in undefined behavior.

[ ... ]
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index 5d98d7e84d57..717d2f876d58 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -43,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
>  			     GElf_Shdr *shp, const char *name, size_t *idx);
>  #endif
>  
> +enum symbol_idle_kind {
> +	SYMBOL_IDLE__UNKNOWN = 0,
> +	SYMBOL_IDLE__NOT_IDLE = 1,
> +	SYMBOL_IDLE__IDLE = 2,
> +};
> +
>  /**
>   * A symtab entry. When allocated this may be preceded by an annotation (see
>   * symbol__annotation) and/or a browser_index (see symbol__browser_index).
> @@ -58,8 +64,8 @@ struct symbol {
>  	u8		type:4;
>  	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
>  	u8		binding:4;
> -	/** Set true for kernel symbols of idle routines. */
> -	u8		idle:1;
> +	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
> +	u8		idle:2;

Are there still places evaluating sym->idle directly as a boolean?

Since idle now defaults to SYMBOL_IDLE__UNKNOWN (0), a check like
!al.sym->idle would evaluate to true for all symbols initially, bypassing
existing filters.

If a non-idle symbol is later evaluated and set to SYMBOL_IDLE__NOT_IDLE (1),
wouldn't !al.sym->idle evaluate to false, causing valid symbols to be dropped
incorrectly?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260501182021.3651851-1-irogers@google.com?part=4

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-04-09 23:37                         ` sashiko-bot
  2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-02  6:59                         ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                             ` (17 more replies)
  2 siblings, 18 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is EM_NONE.
Derive the value from the arch string if available. Similarly derive
the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.
  
Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.
  
To avoid concurrent update issues with bitfields sharing a byte in
`struct symbol` due to the lazy computation, introduce a global lock
for updates to these fields and use setter functions. The reads remain
lockless.
  
v8:
 - Address Sashiko AI review feedback for Patch 1:
   - Switch all code dependent on the arch string to use `e_machine`
     instead (e.g., in `perf c2c`, `perf lock-contention`, `perf
     header`, `perf arch common`, `tests/topology.c`,
     `perf_env__init_kernel_mode`).
   - Update `machine__is` and `machine__normalized_is` to take
     `e_machine` integers instead of strings.
   - Refactor `arch_syscalls__strerrno_function` (generated via
     `arch_errno_names.sh`) to take an `e_machine` instead of an arch
     string.
   - Avoid premature caching of the host architecture in
     `perf_session__e_machine` by using a non-caching helper when
     threads are not yet available.
  
v7:
 - Address better handling of strdup failures with arch in the
   header/env.
 - Address concurrent update issues in `struct symbol` bitfields by
   introducing a global lock for writes.
https://lore.kernel.org/linux-perf-users/20260501182021.3651851-1-irogers@google.com/

v6: Ensure arch is canonical by going to e_machine and back (Sashiko)
https://lore.kernel.org/linux-perf-users/20260409230620.4176210-1-irogers@google.com/
  
v5: Add perf_env os_release helper (Namhyung/Sashiko)
https://lore.kernel.org/lkml/20260406170905.2614260-1-irogers@google.com/
  
v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
https://lore.kernel.org/lkml/20260327045025.2276517-1-irogers@google.com/
  
v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
https://lore.kernel.org/lkml/20260326174521.1829203-1-irogers@google.com/
  
v2: Some minor white space clean up:
https://lore.kernel.org/lkml/20260325161836.1029457-1-irogers@google.com/
  
v1: https://lore.kernel.org/lkml/20260302234343.564937-1-irogers@google.com/

Ian Rogers (17):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf tests topology: Switch env->arch use to env->e_machine
  perf capstone: Determine architecture from e_machine
  perf print_insn: Use e_machine for fallback IP length check
  perf machine: Use perf_env e_machine rather than arch
  perf sample-raw: Use perf_env e_machine rather than arch
  perf sort: Use perf_env e_machine rather than arch
  perf symbol: Avoid use of machine__is
  perf arch common: Use perf_env e_machine rather than arch
  perf header: In print_pmu_caps use perf_env e_machine
  perf c2c: Use perf_env e_machine rather than arch
  perf lock-contention: Use perf_env e_machine rather than arch
  perf env: Refactor perf_env__arch_strerrno
  perf env: Remove unused perf_env__raw_arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Add setters for bitfields sharing a byte to avoid
    concurrent update issues
  perf symbol: Lazily compute idle and use a global lock for updates

 tools/perf/arch/common.c                    |  55 ++--
 tools/perf/builtin-c2c.c                    |   2 +-
 tools/perf/builtin-kwork.c                  |   2 +-
 tools/perf/builtin-sched.c                  |   2 +-
 tools/perf/builtin-trace.c                  |   5 +-
 tools/perf/tests/topology.c                 |   8 +-
 tools/perf/trace/beauty/arch_errno_names.sh |  40 ++-
 tools/perf/util/annotate.c                  |   2 +-
 tools/perf/util/capstone.c                  | 115 +++++---
 tools/perf/util/data-convert-bt.c           |   2 +-
 tools/perf/util/env.c                       | 283 +++++++++++++++-----
 tools/perf/util/env.h                       |  11 +-
 tools/perf/util/header.c                    |  70 +++--
 tools/perf/util/lock-contention.c           |   6 +-
 tools/perf/util/machine.c                   |  25 +-
 tools/perf/util/machine.h                   |   2 -
 tools/perf/util/print_insn.c                |   8 +-
 tools/perf/util/sample-raw.c                |  18 +-
 tools/perf/util/session.c                   |  26 +-
 tools/perf/util/sort.c                      |  12 +-
 tools/perf/util/symbol-elf.c                |   2 +-
 tools/perf/util/symbol.c                    | 163 +++++++----
 tools/perf/util/symbol.h                    |  17 +-
 23 files changed, 612 insertions(+), 264 deletions(-)

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:56                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
                                             ` (16 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back to
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 231 +++++++++++++++++++++++++++++++-------
 tools/perf/util/env.h     |   2 +
 tools/perf/util/header.c  |  47 ++++++--
 tools/perf/util/session.c |  26 +++--
 4 files changed, 243 insertions(+), 63 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..4ff4caab3b32 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -309,12 +311,21 @@ void perf_env__init(struct perf_env *env)
 
 static void perf_env__init_kernel_mode(struct perf_env *env)
 {
-	const char *arch = perf_env__raw_arch(env);
+	uint16_t e_machine = env->e_machine;
 
-	if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
-	    !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
-	    !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
-	    !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
+	if (env->arch && (e_machine == EM_NONE || e_machine == EM_MIPS || e_machine == EM_RISCV)) {
+		if (str_ends_with(env->arch, "64") || !strncmp(env->arch, "s390x", 5))
+			env->kernel_is_64_bit = 1;
+		else
+			env->kernel_is_64_bit = 0;
+		return;
+	}
+	if (e_machine == EM_NONE)
+		e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (e_machine == EM_X86_64 || e_machine == EM_AARCH64 ||
+	    e_machine == EM_PPC64 || e_machine == EM_SPARCV9 ||
+	    e_machine == EM_S390)
 		env->kernel_is_64_bit = 1;
 	else
 		env->kernel_is_64_bit = 0;
@@ -588,51 +599,187 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, int is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/*
+	 * Handle conflicting prefixes. If the is_64_bit is unknown (-1) then
+	 * assume 64-bit. We can't use perf_env__kernel_is_64_bit as that
+	 * depends on the arch string.
+	 */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return (is_64_bit != 0) || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags)
+{
+	uint16_t e_machine = EM_HOST;
+
+	if (env)
+		e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+	if (e_flags && e_machine == EM_HOST)
+		*e_flags = EF_HOST;
+
+	return e_machine;
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	uint16_t e_machine;
+	uint32_t local_e_flags;
+
+	if (env && env->e_machine != EM_NONE) {
+		if (e_flags)
+			*e_flags = env->e_flags;
+
+		return env->e_machine;
+	}
+	e_machine = perf_env__e_machine_nocache(env, &local_e_flags);
+	if (env) {
+		env->e_machine = e_machine;
+		env->e_flags = local_e_flags;
+	}
+	if (e_flags)
+		*e_flags = local_e_flags;
+
+	return e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	uint16_t e_machine;
+	const char *arch;
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
+
+	if (env->arch)
+		return env->arch;
 
-	return normalize_arch(arch_name);
+	/*
+	 * Lazily compute/allocate arch. The e_machine may have been
+	 * read from a data file and so may not be EM_HOST.
+	 */
+	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	arch = e_machine_to_perf_arch(e_machine);
+	env->arch = strdup(arch);
+	/*
+	 * Avoid potential crashes on the arch string if memory allocation in
+	 * strdup fails and NULL were to be returned.
+	 */
+	return env->arch ?: arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..7151a9138e3f 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,8 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags);
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f30e48eb3fc3..8d5152bde25d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -379,21 +379,28 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session) {
+		/* Force the computation in the perf_env of the e_machine of the threads. */
+		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
+		arch = perf_env__arch(perf_session__env(evlist->session));
+	}
+
+	if (!arch) {
+		int ret = uname(&uts);
 
-	return do_write_string(ff, uts.machine);
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2684,10 +2691,30 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	uint16_t saved_e_machine = ff->ph->env.e_machine;
+
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	/*
+	 * Make the arch string canonical by computing the e_machine from it,
+	 * then turning the e_machine back into an arch string.
+	 */
+	ff->ph->env.e_machine = EM_NONE;
+	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
+		zfree(&ff->ph->env.arch);
+		perf_env__arch(&ff->ph->env);
+	}
+	ff->ph->env.e_machine = saved_e_machine;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..3e64db2d27c2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,19 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/*
+	 * Is the env caching an e_machine? If not we want to compute from the
+	 * more accurate threads.
+	 */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
@@ -3048,10 +3053,9 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 
 	/*
 	 * Couldn't determine from the perf_env or current set of
-	 * threads. Default to the host.
+	 * threads. Potentially use logic that uses the arch string otherwise
+	 * default to the host. Don't cache in the perf_env in case later
+	 * threads indicate a better ELF machine type.
 	 */
-	if (e_flags)
-		*e_flags = EF_HOST;
-
-	return EM_HOST;
+	return perf_env__e_machine_nocache(env, e_flags);
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
                                             ` (15 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Some arch string comparisons weren't normalized. Avoid potential
issues with normalized names vs uname values by swtiching to using the
e_machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/topology.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index f54502ebef4b..d4c5c330c679 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -11,6 +11,7 @@
 #include "pmus.h"
 #include "target.h"
 #include <linux/err.h>
+#include <elf.h>
 
 #define TEMPL "/tmp/perf-test-XXXXXX"
 #define DATA_SIZE	10
@@ -74,6 +75,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	struct aggr_cpu_id id;
 	struct perf_cpu cpu;
 	struct perf_env *env;
+	uint16_t e_machine;
 
 	session = perf_session__new(&data, NULL);
 	TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
@@ -101,7 +103,9 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	 *  condition is true (see do_core_id_test in header.c). So always
 	 *  run this test on those platforms.
 	 */
-	if (!env->cpu && strncmp(env->arch, "s390", 4) && strncmp(env->arch, "aarch64", 7))
+	e_machine = perf_env__e_machine(env, NULL);
+
+	if (!env->cpu && e_machine != EM_S390 && e_machine != EM_AARCH64)
 		return TEST_SKIP;
 
 	/*
@@ -110,7 +114,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	 * physical_package_id will be set to -1. Hence skip this
 	 * test if physical_package_id returns -1 for cpu from perf_cpu_map.
 	 */
-	if (!strncmp(env->arch, "ppc64le", 7)) {
+	if (e_machine == EM_PPC64) {
 		if (cpu__get_socket_id(perf_cpu_map__cpu(map, 0)) == -1)
 			return TEST_SKIP;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 03/17] perf capstone: Determine architecture from e_machine
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:58                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
                                             ` (14 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Avoid the use of arch string that is imprecise and use the
e_machine. Do more e_machine to capstone machine translations adding
MIPS and RISCV. Remove unnecessary maybe_unused annotations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/capstone.c | 115 +++++++++++++++++++++++++------------
 1 file changed, 79 insertions(+), 36 deletions(-)

diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
index 25cf6e15ec27..e6226b751c36 100644
--- a/tools/perf/util/capstone.c
+++ b/tools/perf/util/capstone.c
@@ -16,6 +16,7 @@
 #include <fcntl.h>
 #include <inttypes.h>
 #include <string.h>
+#include <elf.h>
 
 #include <capstone/capstone.h>
 
@@ -137,37 +138,74 @@ static enum cs_err perf_cs_close(csh *handle)
 #endif
 }
 
-static int capstone_init(struct machine *machine, csh *cs_handle, bool is64,
-			 bool disassembler_style)
+static bool e_machine_to_capstone(uint16_t e_machine, bool is64,
+				  enum cs_arch *arch, enum cs_mode *mode)
+{
+        switch (e_machine) {
+        case EM_X86_64:
+		*arch = CS_ARCH_X86;
+		*mode = CS_MODE_64;
+		return true;
+        case EM_386:
+		*arch = CS_ARCH_X86;
+		*mode = CS_MODE_32;
+		return true;
+        case EM_AARCH64:
+		*arch = CS_ARCH_ARM64;
+		*mode = CS_MODE_ARM;
+		return true;
+        case EM_ARM:
+		*arch = CS_ARCH_ARM;
+		*mode = CS_MODE_ARM | CS_MODE_V8;
+		return true;
+        case EM_S390:
+		*arch = CS_ARCH_SYSZ;
+		*mode = CS_MODE_BIG_ENDIAN;
+		return true;
+        case EM_MIPS:
+		*arch = CS_ARCH_MIPS;
+		*mode = is64 ? CS_MODE_MIPS64 : CS_MODE_MIPS32;
+		*mode |= CS_MODE_BIG_ENDIAN;
+		return true;
+        case EM_PPC:
+		*arch = CS_ARCH_PPC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_32;
+		return true;
+        case EM_PPC64:
+		*arch = CS_ARCH_PPC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_64;
+		return true;
+        case EM_SPARC:
+		*arch = CS_ARCH_SPARC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_32;
+		return true;
+        case EM_SPARCV9:
+		*arch = CS_ARCH_SPARC;
+		*mode = CS_MODE_BIG_ENDIAN | CS_MODE_V9 | CS_MODE_64;
+		return true;
+        case EM_RISCV:
+		*arch = CS_ARCH_RISCV;
+		*mode = is64 ? CS_MODE_RISCV64 : CS_MODE_RISCV32;
+		return true;
+        default:
+		return false;
+        }
+}
+
+static int capstone_init(uint16_t e_machine, csh *cs_handle, bool is64, bool disassembler_style)
 {
 	enum cs_arch arch;
 	enum cs_mode mode;
 
-	if (machine__is(machine, "x86_64") && is64) {
-		arch = CS_ARCH_X86;
-		mode = CS_MODE_64;
-	} else if (machine__normalized_is(machine, "x86")) {
-		arch = CS_ARCH_X86;
-		mode = CS_MODE_32;
-	} else if (machine__normalized_is(machine, "arm64")) {
-		arch = CS_ARCH_ARM64;
-		mode = CS_MODE_ARM;
-	} else if (machine__normalized_is(machine, "arm")) {
-		arch = CS_ARCH_ARM;
-		mode = CS_MODE_ARM + CS_MODE_V8;
-	} else if (machine__normalized_is(machine, "s390")) {
-		arch = CS_ARCH_SYSZ;
-		mode = CS_MODE_BIG_ENDIAN;
-	} else {
+	if (!e_machine_to_capstone(e_machine, is64, &arch, &mode))
 		return -1;
-	}
 
 	if (perf_cs_open(arch, mode, cs_handle) != CS_ERR_OK) {
 		pr_warning_once("cs_open failed\n");
 		return -1;
 	}
 
-	if (machine__normalized_is(machine, "x86")) {
+	if (arch == CS_ARCH_X86) {
 		/*
 		 * In case of using capstone_init while symbol__disassemble
 		 * setting CS_OPT_SYNTAX_ATT depends if disassembler_style opts
@@ -212,28 +250,31 @@ static size_t print_insn_x86(struct thread *thread, u8 cpumode, struct cs_insn *
 }
 
 
-ssize_t capstone__fprintf_insn_asm(struct machine *machine __maybe_unused,
-				   struct thread *thread __maybe_unused,
-				   u8 cpumode __maybe_unused, bool is64bit __maybe_unused,
-				   const uint8_t *code __maybe_unused,
-				   size_t code_size __maybe_unused,
-				   uint64_t ip __maybe_unused, int *lenp __maybe_unused,
-				   int print_opts __maybe_unused, FILE *fp __maybe_unused)
+ssize_t capstone__fprintf_insn_asm(struct machine *machine,
+				   struct thread *thread,
+				   u8 cpumode,
+				   bool is64bit,
+				   const uint8_t *code,
+				   size_t code_size,
+				   uint64_t ip, int *lenp,
+				   int print_opts,
+				   FILE *fp)
 {
 	size_t printed;
 	struct cs_insn *insn;
 	csh cs_handle;
 	size_t count;
+	uint16_t e_machine = thread__e_machine(thread, machine, /*e_flags=*/NULL);
 	int ret;
 
 	/* TODO: Try to initiate capstone only once but need a proper place. */
-	ret = capstone_init(machine, &cs_handle, is64bit, true);
+	ret = capstone_init(e_machine, &cs_handle, is64bit, /*disassembler_style=*/true);
 	if (ret < 0)
 		return ret;
 
 	count = perf_cs_disasm(cs_handle, code, code_size, ip, 1, &insn);
 	if (count > 0) {
-		if (machine__normalized_is(machine, "x86"))
+		if (e_machine == EM_X86_64 || e_machine == EM_386)
 			printed = print_insn_x86(thread, cpumode, &insn[0], print_opts, fp);
 		else
 			printed = fprintf(fp, "%s %s", insn[0].mnemonic, insn[0].op_str);
@@ -322,9 +363,9 @@ static int find_file_offset(u64 start, u64 len, u64 pgoff, void *arg)
 	return 0;
 }
 
-int symbol__disassemble_capstone(const char *filename __maybe_unused,
-				 struct symbol *sym __maybe_unused,
-				 struct annotate_args *args __maybe_unused)
+int symbol__disassemble_capstone(const char *filename,
+				 struct symbol *sym,
+				 struct annotate_args *args)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct map *map = args->ms->map;
@@ -344,6 +385,7 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
 	char disasm_buf[512];
 	struct disasm_line *dl;
 	bool disassembler_style = false;
+	uint16_t e_machine;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -373,8 +415,8 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
 	    !strcmp(args->options->disassembler_style, "att"))
 		disassembler_style = true;
 
-	if (capstone_init(maps__machine(thread__maps(args->ms->thread)), &handle, is_64bit,
-			  disassembler_style) < 0)
+	e_machine = thread__e_machine(args->ms->thread, /*machine=*/NULL, /*e_flags=*/NULL);
+	if (capstone_init(e_machine, &handle, is_64bit, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
@@ -466,6 +508,7 @@ int symbol__disassemble_capstone_powerpc(const char *filename __maybe_unused,
 	struct disasm_line *dl;
 	u32 *line;
 	bool disassembler_style = false;
+	uint16_t e_machine;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -484,8 +527,8 @@ int symbol__disassemble_capstone_powerpc(const char *filename __maybe_unused,
 	    !strcmp(args->options->disassembler_style, "att"))
 		disassembler_style = true;
 
-	if (capstone_init(maps__machine(thread__maps(args->ms->thread)), &handle, is_64bit,
-			  disassembler_style) < 0)
+	e_machine = thread__e_machine(args->ms->thread, /*machine=*/NULL, /*e_flags=*/NULL);
+	if (capstone_init(e_machine, &handle, is_64bit, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (2 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:55                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
                                             ` (13 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Avoid string comparisons with perf_env arch, switch to using the more
precise ELF machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/print_insn.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index 02e6fbb8ca04..5e36344174d6 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -9,6 +9,7 @@
 #include <stdbool.h>
 #include "capstone.h"
 #include "debug.h"
+#include "env.h"
 #include "sample.h"
 #include "symbol.h"
 #include "machine.h"
@@ -17,6 +18,7 @@
 #include "dump-insn.h"
 #include "map.h"
 #include "dso.h"
+#include <elf.h>
 
 size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 {
@@ -33,13 +35,13 @@ size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 static bool is64bitip(struct machine *machine, struct addr_location *al)
 {
 	const struct dso *dso = al->map ? map__dso(al->map) : NULL;
+	uint16_t e_machine;
 
 	if (dso)
 		return dso__is_64_bit(dso);
 
-	return machine__is(machine, "x86_64") ||
-		machine__normalized_is(machine, "arm64") ||
-		machine__normalized_is(machine, "s390");
+	e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);
+	return e_machine == EM_X86_64 || e_machine == EM_AARCH64 || e_machine == EM_S390;
 }
 
 ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpumode,
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (3 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:11                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 06/17] perf sample-raw: " Ian Rogers
                                             ` (12 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

The arch string is derived from uname and may be normalized causing
potential differences meaning the ELF machine can be more
precise. Reduce the scope of machine__is as often it is better to use
a thread for the e_machine rather than the machine. Switch from string
to ELF machine constant comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 25 ++++++++-----------------
 tools/perf/util/machine.h |  2 --
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e76f8c86e62a..6d32d3cb5cb7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1611,10 +1611,15 @@ static bool machine__uses_kcore(struct machine *machine)
 	return dsos__for_each_dso(&machine->dsos, machine__uses_kcore_cb, NULL) != 0 ? true : false;
 }
 
+static bool machine__is(struct machine *machine, uint16_t e_machine)
+{
+	return machine && perf_env__e_machine(machine->env, NULL) == e_machine;
+}
+
 static bool perf_event__is_extra_kernel_mmap(struct machine *machine,
 					     struct extra_kernel_map *xm)
 {
-	return machine__is(machine, "x86_64") &&
+	return machine__is(machine, EM_X86_64) &&
 	       is_entry_trampoline(xm->name);
 }
 
@@ -2770,7 +2775,7 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 static u64 get_leaf_frame_caller(struct perf_sample *sample,
 		struct thread *thread, int usr_idx)
 {
-	if (machine__normalized_is(maps__machine(thread__maps(thread)), "arm64"))
+	if (thread__e_machine(thread, /*machine=*/NULL, /*e_flags=*/NULL) == EM_AARCH64)
 		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
 	else
 		return 0;
@@ -3141,20 +3146,6 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 	return 0;
 }
 
-/*
- * Compares the raw arch string. N.B. see instead perf_env__arch() or
- * machine__normalized_is() if a normalized arch is needed.
- */
-bool machine__is(struct machine *machine, const char *arch)
-{
-	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
-}
-
-bool machine__normalized_is(struct machine *machine, const char *arch)
-{
-	return machine && !strcmp(perf_env__arch(machine->env), arch);
-}
-
 int machine__nr_cpus_avail(struct machine *machine)
 {
 	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
@@ -3181,7 +3172,7 @@ int machine__get_kernel_start(struct machine *machine)
 		 * start of kernel text, but still above 2^63. So leave
 		 * kernel_start = 1ULL << 63 for x86_64.
 		 */
-		if (!err && !machine__is(machine, "x86_64"))
+		if (!err && !machine__is(machine, EM_X86_64))
 			machine->kernel_start = map__start(map);
 	}
 	return err;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 22a42c5825fa..003c970b3e4b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -227,8 +227,6 @@ static inline bool machine__is_host(struct machine *machine)
 }
 
 bool machine__is_lock_function(struct machine *machine, u64 addr);
-bool machine__is(struct machine *machine, const char *arch);
-bool machine__normalized_is(struct machine *machine, const char *arch);
 int machine__nr_cpus_avail(struct machine *machine);
 
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 06/17] perf sample-raw: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (4 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 07/17] perf sort: " Ian Rogers
                                             ` (11 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than the arch to determine S390 and x86 types.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/sample-raw.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/sample-raw.c b/tools/perf/util/sample-raw.c
index bcf442574d6e..b10056ac8057 100644
--- a/tools/perf/util/sample-raw.c
+++ b/tools/perf/util/sample-raw.c
@@ -1,6 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-
-#include <string.h>
+#include <elf.h>
 #include <linux/string.h>
 #include "evlist.h"
 #include "env.h"
@@ -14,14 +13,15 @@
  */
 void evlist__init_trace_event_sample_raw(struct evlist *evlist, struct perf_env *env)
 {
-	const char *arch_pf = perf_env__arch(env);
-	const char *cpuid = perf_env__cpuid(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (arch_pf && !strcmp("s390", arch_pf))
+	if (e_machine == EM_S390) {
 		evlist->trace_event_sample_raw = evlist__s390_sample_raw;
-	else if (arch_pf && !strcmp("x86", arch_pf) &&
-		 cpuid && strstarts(cpuid, "AuthenticAMD") &&
-		 evlist__has_amd_ibs(evlist)) {
-		evlist->trace_event_sample_raw = evlist__amd_sample_raw;
+	} else if (e_machine == EM_X86_64 || e_machine == EM_386) {
+		const char *cpuid = perf_env__cpuid(env);
+
+		if (cpuid && strstarts(cpuid, "AuthenticAMD") &&
+		    evlist__has_amd_ibs(evlist))
+			evlist->trace_event_sample_raw = evlist__amd_sample_raw;
 	}
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 07/17] perf sort: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (5 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 06/17] perf sample-raw: " Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
                                             ` (10 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than the arch to determine x86 or PPC types.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/sort.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 0020089cb13c..06a641cf49e3 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <ctype.h>
+#include <elf.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <regex.h>
@@ -2673,9 +2674,10 @@ struct sort_dimension {
 
 static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 {
-	const char *arch = perf_env__arch(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
+	if (e_machine == EM_X86_64 || e_machine == EM_386 ||
+	    e_machine == EM_PPC64 || e_machine == EM_PPC) {
 		if (!strcmp(sort_key, "p_stage_cyc"))
 			return 1;
 		if (!strcmp(sort_key, "local_p_stage_cyc"))
@@ -2686,14 +2688,14 @@ static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 
 static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
 {
-	const char *arch = perf_env__arch(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (!strcmp("x86", arch)) {
+	if (e_machine == EM_X86_64 || e_machine == EM_386) {
 		if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
 			return "Local Retire Latency";
 		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
 			return "Retire Latency";
-	} else if (!strcmp("powerpc", arch)) {
+	} else if (e_machine == EM_PPC64 || e_machine == EM_PPC) {
 		if (!strcmp(se_header, "Local INSTR Latency"))
 			return "Finish Cyc";
 		else if (!strcmp(se_header, "INSTR Latency"))
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 08/17] perf symbol: Avoid use of machine__is
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (6 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 07/17] perf sort: " Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:17                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
                                             ` (9 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch to using the ELF machine from the dso or running machine rather
than the machine perf_env arch that may fall back on EM_HOST. This
also avoids potentially imprecise string comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..8aaaab0ad4b7 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -851,6 +851,24 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 	return count;
 }
 
+static uint16_t machine_or_dso_e_machine(struct machine *machine, struct dso *dso)
+{
+	uint16_t e_machine = EM_NONE;
+
+	/* Check for a cached value first. */
+	if (machine && machine->env && machine->env->e_machine != EM_NONE)
+		return machine->env->e_machine;
+
+	/* DSO should be most accurate */
+	if (dso)
+		e_machine = dso__e_machine(dso, machine, /*e_flags=*/NULL);
+
+	if (e_machine != EM_NONE)
+		return e_machine;
+
+	return perf_env__e_machine(machine ? machine->env : NULL, /*e_flags=*/NULL);
+}
+
 /*
  * Split the symbols into maps, making sure there are no overlaps, i.e. the
  * kernel range is broken in several maps, named [kernel].N, as we don't have
@@ -866,14 +884,13 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 	struct rb_root_cached *root = dso__symbols(dso);
 	struct rb_node *next = rb_first_cached(root);
 	int kernel_range = 0;
-	bool x86_64;
+	uint16_t e_machine = EM_NONE;
 
 	if (!kmaps)
 		return -1;
 
 	machine = maps__machine(kmaps);
-
-	x86_64 = machine__is(machine, "x86_64");
+	e_machine = machine_or_dso_e_machine(machine, dso);
 
 	while (next) {
 		char *module;
@@ -925,7 +942,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			 */
 			pos->start = map__map_ip(curr_map, pos->start);
 			pos->end   = map__map_ip(curr_map, pos->end);
-		} else if (x86_64 && is_entry_trampoline(pos->name)) {
+		} else if (e_machine == EM_X86_64 && is_entry_trampoline(pos->name)) {
 			/*
 			 * These symbols are not needed anymore since the
 			 * trampoline maps refer to the text section and it's
@@ -1428,7 +1445,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 		free(new_node);
 	}
 
-	if (machine__is(machine, "x86_64")) {
+	if (machine_or_dso_e_machine(machine, dso) == EM_X86_64) {
 		u64 addr;
 
 		/*
@@ -1716,7 +1733,7 @@ int dso__load(struct dso *dso, struct map *map)
 			ret = dso__load_guest_kernel_sym(dso, map);
 
 		machine = maps__machine(map__kmaps(map));
-		if (machine__is(machine, "x86_64"))
+		if (machine_or_dso_e_machine(machine, dso) == EM_X86_64)
 			machine__map_x86_64_entry_trampolines(machine, dso);
 		goto out;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (7 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:59                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
                                             ` (8 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/common.c | 55 +++++++++++++++++++++++++---------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index 21836f70f231..e9b5b61feffe 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -1,12 +1,14 @@
 // SPDX-License-Identifier: GPL-2.0
+#include "common.h"
+
 #include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#include "common.h"
 #include "../util/env.h"
 #include "../util/debug.h"
+#include <dwarf-regs.h>
 #include <linux/zalloc.h>
 
 static const char *const arc_triplets[] = {
@@ -145,7 +147,8 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 					  const char *name, char **path)
 {
 	int idx;
-	const char *arch = perf_env__arch(env), *cross_env;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	const char *cross_env;
 	const char *const *path_list;
 	char *buf = NULL;
 
@@ -153,7 +156,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 	 * We don't need to try to find objdump path for native system.
 	 * Just use default binutils path (e.g.: "objdump").
 	 */
-	if (!strcmp(perf_env__arch(NULL), arch))
+	if (e_machine == EM_HOST)
 		goto out;
 
 	cross_env = getenv("CROSS_COMPILE");
@@ -170,30 +173,42 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 		zfree(&buf);
 	}
 
-	if (!strcmp(arch, "arc"))
+	switch(e_machine) {
+	case EM_ARC:
 		path_list = arc_triplets;
-	else if (!strcmp(arch, "arm"))
+		break;
+	case EM_ARM:
 		path_list = arm_triplets;
-	else if (!strcmp(arch, "arm64"))
+		break;
+	case EM_AARCH64:
 		path_list = arm64_triplets;
-	else if (!strcmp(arch, "powerpc"))
+		break;
+	case EM_PPC:
+	case EM_PPC64:
 		path_list = powerpc_triplets;
-	else if (!strcmp(arch, "riscv32"))
-		path_list = riscv32_triplets;
-	else if (!strcmp(arch, "riscv64"))
-		path_list = riscv64_triplets;
-	else if (!strcmp(arch, "sh"))
+		break;
+	case EM_RISCV:
+		path_list = perf_env__kernel_is_64_bit(env) ? riscv64_triplets : riscv32_triplets;
+		break;
+	case EM_SH:
 		path_list = sh_triplets;
-	else if (!strcmp(arch, "s390"))
+		break;
+	case EM_S390:
 		path_list = s390_triplets;
-	else if (!strcmp(arch, "sparc"))
+		break;
+	case EM_SPARC:
+	case EM_SPARCV9:
 		path_list = sparc_triplets;
-	else if (!strcmp(arch, "x86"))
+		break;
+	case EM_X86_64:
+	case EM_386:
 		path_list = x86_triplets;
-	else if (!strcmp(arch, "mips"))
+		break;
+	case EM_MIPS:
 		path_list = mips_triplets;
-	else {
-		ui__error("binutils for %s not supported.\n", arch);
+		break;
+	default:
+		ui__error("binutils for %s not supported.\n", perf_env__arch(env));
 		goto out_error;
 	}
 
@@ -202,7 +217,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 		ui__error("Please install %s for %s.\n"
 			  "You can add it to PATH, set CROSS_COMPILE or "
 			  "override the default using --%s.\n",
-			  name, arch, name);
+			  name, perf_env__arch(env), name);
 		goto out_error;
 	}
 
@@ -237,5 +252,5 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
  */
 bool perf_env__single_address_space(struct perf_env *env)
 {
-	return strcmp(perf_env__arch(env), "sparc");
+	return perf_env__e_machine(env, /*e_flags=*/NULL) == EM_SPARC;
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (8 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
                                             ` (7 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch from arch to e_machine in print_pmu_caps.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/header.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 8d5152bde25d..c6436269df4b 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2348,15 +2348,16 @@ static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
 static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
 {
 	struct perf_env *env = &ff->ph->env;
-	struct pmu_caps *pmu_caps;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
 	for (int i = 0; i < env->nr_pmus_with_caps; i++) {
-		pmu_caps = &env->pmu_caps[i];
+		struct pmu_caps *pmu_caps = &env->pmu_caps[i];
+
 		__print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps,
 				 pmu_caps->pmu_name);
 	}
 
-	if (strcmp(perf_env__arch(env), "x86") == 0 &&
+	if ((e_machine == EM_X86_64 || e_machine == EM_386) &&
 	    perf_env__has_pmu_mapping(env, "ibs_op")) {
 		char *max_precise = perf_env__find_pmu_cap(env, "cpu", "max_precise");
 
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (9 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:44                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 12/17] perf lock-contention: " Ian Rogers
                                             ` (6 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching for AARCH64.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-c2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 72a7802775ee..09c8352a922c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3202,7 +3202,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	 * default display type.
 	 */
 	if (!display) {
-		if (!strcmp(perf_env__arch(env), "arm64"))
+		if (perf_env__e_machine(env, /*e_flags=*/NULL) == EM_AARCH64)
 			display = "peer";
 		else
 			display = "tot";
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 12/17] perf lock-contention: Use perf_env e_machine rather than arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (10 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno Ian Rogers
                                             ` (5 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching for powerpc.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/lock-contention.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/lock-contention.c b/tools/perf/util/lock-contention.c
index 92e7b7b572a2..119a7206f3cd 100644
--- a/tools/perf/util/lock-contention.c
+++ b/tools/perf/util/lock-contention.c
@@ -104,7 +104,8 @@ bool match_callstack_filter(struct machine *machine, u64 *callstack, int max_sta
 	struct map *kmap;
 	struct symbol *sym;
 	u64 ip;
-	const char *arch = perf_env__arch(machine->env);
+	uint16_t e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);
+	bool is_powerpc = e_machine == EM_PPC64 || e_machine == EM_PPC;
 
 	if (list_empty(&callstack_filters))
 		return true;
@@ -125,8 +126,7 @@ bool match_callstack_filter(struct machine *machine, u64 *callstack, int max_sta
 		 * incase first or second callstack index entry has 0
 		 * address for powerpc.
 		 */
-		if (!callstack || (!callstack[i] && (strcmp(arch, "powerpc") ||
-						(i != 1 && i != 2))))
+		if (!callstack || (!callstack[i] && (!is_powerpc || (i != 1 && i != 2))))
 			break;
 
 		ip = callstack[i];
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (11 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 12/17] perf lock-contention: " Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch Ian Rogers
                                             ` (4 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

perf_env__arch_strerrno is only available with libtraceevent so hide
the declaration if no libtraceevent.

The previous approach maps an architecture string to a function
pointer to a function that takes an int errno values and returns a
string. The new approach takes an e_machine and an errno value and
returns a string.

As the only call site is in builtin-trace.c, the e_machine is already
present and potentially more specific than the perf_env arch string
that is a single global value.

The major complication in this approach is having the shell script
that generates the C code map a linux directory name to the matching
ELF machine constants.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-trace.c                  |  5 ++-
 tools/perf/trace/beauty/arch_errno_names.sh | 40 ++++++++++++++++++---
 tools/perf/util/env.c                       | 13 +++----
 tools/perf/util/env.h                       |  7 ++--
 4 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e58c49d047a2..d1f21b5e7c98 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3008,9 +3008,8 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
 	} else if (ret < 0) {
 errno_print: {
 		char bf[STRERR_BUFSIZE];
-		struct perf_env *env = evsel__env(evsel) ?: &trace->host_env;
 		const char *emsg = str_error_r(-ret, bf, sizeof(bf));
-		const char *e = perf_env__arch_strerrno(env, err);
+		const char *e = perf_env__arch_strerrno(e_machine, err);
 
 		fprintf(trace->output, "-1 %s (%s)", e, emsg);
 	}
@@ -4890,7 +4889,7 @@ static size_t syscall__dump_stats(struct trace *trace, int e_machine, FILE *fp,
 
 				for (e = 0; e < stats->max_errno; ++e) {
 					if (stats->errnos[e] != 0)
-						fprintf(fp, "\t\t\t\t%s: %d\n", perf_env__arch_strerrno(trace->host->env, e + 1), stats->errnos[e]);
+						fprintf(fp, "\t\t\t\t%s: %d\n", perf_env__arch_strerrno(e_machine, e + 1), stats->errnos[e]);
 				}
 			}
 			lines++;
diff --git a/tools/perf/trace/beauty/arch_errno_names.sh b/tools/perf/trace/beauty/arch_errno_names.sh
index b22890b8d272..89b742927168 100755
--- a/tools/perf/trace/beauty/arch_errno_names.sh
+++ b/tools/perf/trace/beauty/arch_errno_names.sh
@@ -52,21 +52,49 @@ process_arch()
 		|IFS=, create_errno_lookup_func "$arch"
 }
 
+arch_to_e_machine()
+{
+	case "$1" in
+	alpha)      printf '\tcase EM_ALPHA:\n' ;;
+	arc)        printf '\tcase EM_ARC:\n' ;;
+	arm)        printf '\tcase EM_ARM:\n' ;;
+	arm64)      printf '\tcase EM_AARCH64:\n' ;;
+	csky)       printf '\tcase EM_CSKY:\n' ;;
+	hexagon)    printf '\tcase EM_HEXAGON:\n' ;;
+	loongarch)  printf '\tcase EM_LOONGARCH:\n' ;;
+	microblaze) printf '\tcase EM_MICROBLAZE:\n' ;;
+	mips)       printf '\tcase EM_MIPS:\n' ;;
+	parisc)     printf '\tcase EM_PARISC:\n' ;;
+	powerpc)    printf '\tcase EM_PPC:\n\tcase EM_PPC64:\n' ;;
+	riscv)      printf '\tcase EM_RISCV:\n' ;;
+	s390)       printf '\tcase EM_S390:\n' ;;
+	sh)         printf '\tcase EM_SH:\n' ;;
+	sparc)      printf '\tcase EM_SPARC:\n\tcase EM_SPARCV9:\n' ;;
+	x86)        printf '\tcase EM_386:\n\tcase EM_X86_64:\n' ;;
+	xtensa)     printf '\tcase EM_XTENSA:\n' ;;
+	esac
+}
+
 create_arch_errno_table_func()
 {
 	archlist="$1"
 	default="$2"
 
-	printf 'static arch_syscalls__strerrno_t *\n'
-	printf 'arch_syscalls__strerrno_function(const char *arch)\n'
+	printf 'static const char *\n'
+	printf 'arch_syscalls__strerrno(uint16_t e_machine, int err)\n'
 	printf '{\n'
+	printf '\tswitch (e_machine) {\n'
 	for arch in $archlist; do
 		arch_str=$(arch_string "$arch")
-		printf '\tif (!strcmp(arch, "%s"))\n' "$arch_str"
-		printf '\t\treturn errno_to_name__%s;\n' "$arch_str"
+		ems=$(arch_to_e_machine "$arch_str")
+		if [ -n "$ems" ]; then
+			printf '%s\n' "$ems"
+			printf '\t\treturn errno_to_name__%s(err);\n' "$arch_str"
+		fi
 	done
 	arch_str=$(arch_string "$default")
-	printf '\treturn errno_to_name__%s;\n' "$arch_str"
+	printf '\tdefault:\n\t\treturn errno_to_name__%s(err);\n' "$arch_str"
+	printf '\t}\n'
 	printf '}\n'
 }
 
@@ -74,6 +102,8 @@ cat <<EoHEADER
 /* SPDX-License-Identifier: GPL-2.0 */
 
 #include <string.h>
+#include <stdint.h>
+#include <elf.h>
 
 EoHEADER
 
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4ff4caab3b32..97f4aa1131a1 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -786,17 +786,12 @@ const char *perf_env__arch(struct perf_env *env)
 #include "trace/beauty/arch_errno_names.c"
 #endif
 
-const char *perf_env__arch_strerrno(struct perf_env *env __maybe_unused, int err __maybe_unused)
-{
 #if defined(HAVE_LIBTRACEEVENT)
-	if (env->arch_strerrno == NULL)
-		env->arch_strerrno = arch_syscalls__strerrno_function(perf_env__arch(env));
-
-	return env->arch_strerrno ? env->arch_strerrno(err) : "no arch specific strerrno function";
-#else
-	return "!HAVE_LIBTRACEEVENT";
-#endif
+const char *perf_env__arch_strerrno(uint16_t e_machine, int err)
+{
+	return arch_syscalls__strerrno(e_machine, err);
 }
+#endif
 
 const char *perf_env__cpuid(struct perf_env *env)
 {
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 7151a9138e3f..68dead1b36a6 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -67,8 +67,6 @@ struct cpu_domain_map {
 	struct domain_info	**domains;
 };
 
-typedef const char *(arch_syscalls__strerrno_t)(int err);
-
 struct perf_env {
 	char			*hostname;
 	char			*os_release;
@@ -158,7 +156,6 @@ struct perf_env {
 		 */
 		bool	enabled;
 	} clock;
-	arch_syscalls__strerrno_t *arch_strerrno;
 };
 
 enum perf_compress_type {
@@ -190,7 +187,9 @@ void cpu_cache_level__free(struct cpu_cache_level *cache);
 uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags);
 uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
-const char *perf_env__arch_strerrno(struct perf_env *env, int err);
+#if defined(HAVE_LIBTRACEEVENT)
+const char *perf_env__arch_strerrno(uint16_t e_machine, int err);
+#endif
 const char *perf_env__cpuid(struct perf_env *env);
 const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (12 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
                                             ` (3 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

The switch to using e_machine has made the perf_env__raw_arch function
unused so remove it.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c | 18 ------------------
 tools/perf/util/env.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 97f4aa1131a1..5944acd28996 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -451,19 +451,6 @@ int perf_env__read_cpuid(struct perf_env *env)
 	return 0;
 }
 
-static int perf_env__read_arch(struct perf_env *env)
-{
-	struct utsname uts;
-
-	if (env->arch)
-		return 0;
-
-	if (!uname(&uts))
-		env->arch = strdup(uts.machine);
-
-	return env->arch ? 0 : -ENOMEM;
-}
-
 static int perf_env__read_nr_cpus_avail(struct perf_env *env)
 {
 	if (env->nr_cpus_avail == 0)
@@ -582,11 +569,6 @@ int perf_env__read_core_pmu_caps(struct perf_env *env)
 	return ret;
 }
 
-const char *perf_env__raw_arch(struct perf_env *env)
-{
-	return env && !perf_env__read_arch(env) ? env->arch : "unknown";
-}
-
 int perf_env__nr_cpus_avail(struct perf_env *env)
 {
 	return env && !perf_env__read_nr_cpus_avail(env) ? env->nr_cpus_avail : 0;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 68dead1b36a6..a95fd7eb3524 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -191,7 +191,6 @@ const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(uint16_t e_machine, int err);
 #endif
 const char *perf_env__cpuid(struct perf_env *env);
-const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
 
 void perf_env__init(struct perf_env *env);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (13 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:53                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
                                             ` (2 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 5944acd28996..1090aaa2985f 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -339,6 +339,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname or strdup fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release ?: perf_version_string;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a95fd7eb3524..989545a47798 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -172,6 +172,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c6436269df4b..4867a932cb88 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,13 +370,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 8aaaab0ad4b7..a70066d17729 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2226,7 +2226,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2243,7 +2243,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (14 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-02  7:55                             ` sashiko-bot
  2026-05-02  6:59                           ` [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

A problem with putting bitfields into struct symbol is that other bits in
the symbol could be updated concurrently and only one update to the
underlying storage unit happen, leading to lost updates.

To avoid this, introduce a global lock `symbol_bits_lock` in `symbol.c`
and helper functions to update the bits sharing a byte:
`symbol__set_ignore` and `symbol__set_annotate2`.

`inlined` is not given a setter as it is only initialized in
`new_inline_sym` when the symbol is under construction and not shared.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-kwork.c |  2 +-
 tools/perf/builtin-sched.c |  2 +-
 tools/perf/util/annotate.c |  2 +-
 tools/perf/util/symbol.c   | 22 ++++++++++++++++++++++
 tools/perf/util/symbol.h   |  3 +++
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
index 9d3a4c779a41..7337ee956dc9 100644
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@@ -725,7 +725,7 @@ static void timehist_save_callchain(struct perf_kwork *kwork,
 		if (sym) {
 			if (!strcmp(sym->name, "__softirqentry_text_start") ||
 			    !strcmp(sym->name, "__do_softirq"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 555247568e7a..655e95f660c2 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2371,7 +2371,7 @@ static void save_task_callchain(struct perf_sched *sched,
 			if (!strcmp(sym->name, "schedule") ||
 			    !strcmp(sym->name, "__schedule") ||
 			    !strcmp(sym->name, "preempt_schedule"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..d550a0061159 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2224,7 +2224,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 
 	annotation__init_column_widths(notes, sym);
 	annotation__update_column_widths(notes);
-	sym->annotate2 = 1;
+	symbol__set_annotate2(sym, true);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a70066d17729..1238a0d6ce6e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -31,6 +31,7 @@
 #include "map.h"
 #include "symbol.h"
 #include "map_symbol.h"
+#include "mutex.h"
 #include "mem-events.h"
 #include "mem-info.h"
 #include "symsrc.h"
@@ -52,6 +53,8 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
 static bool symbol__is_idle(const char *name);
 
+static struct mutex symbol_bits_lock;
+
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
@@ -345,6 +348,20 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
+void symbol__set_ignore(struct symbol *sym, bool ignore)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->ignore = ignore;
+	mutex_unlock(&symbol_bits_lock);
+}
+
+void symbol__set_annotate2(struct symbol *sym, bool annotate2)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->annotate2 = annotate2;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -2415,6 +2432,8 @@ int symbol__init(struct perf_env *env)
 	if (symbol_conf.initialized)
 		return 0;
 
+	mutex_init(&symbol_bits_lock);
+
 	symbol_conf.priv_size = PERF_ALIGN(symbol_conf.priv_size, sizeof(u64));
 
 	symbol__elf_init();
@@ -2493,6 +2512,9 @@ void symbol__exit(void)
 {
 	if (!symbol_conf.initialized)
 		return;
+
+	mutex_destroy(&symbol_bits_lock);
+
 	strlist__delete(symbol_conf.bt_stop_list);
 	strlist__delete(symbol_conf.sym_list);
 	strlist__delete(symbol_conf.dso_list);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..5d98d7e84d57 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -77,6 +77,9 @@ struct symbol {
 void symbol__delete(struct symbol *sym);
 void symbols__delete(struct rb_root_cached *symbols);
 
+void symbol__set_ignore(struct symbol *sym, bool ignore);
+void symbol__set_annotate2(struct symbol *sym, bool annotate2);
+
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
  * @symbols: the rb_root of symbols
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (15 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-02  6:59                           ` Ian Rogers
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-02  6:59 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Move the idle boolean to a helper symbol__is_idle function. In the
function lazily compute whether a symbol is an idle function taking
into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

To avoid concurrent update issues with other bitfields in `struct symbol`,
this change uses the global lock `symbol_bits_lock` (introduced in a
previous commit) for updates to the `idle` field. A static helper
`symbol__set_idle` taking a boolean is used to encapsulate the lock and
mapping to `enum symbol_idle_kind`.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 108 +++++++++++++++++++++++------------
 tools/perf/util/symbol.h     |  14 +++--
 3 files changed, 81 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..e8f7fe3f19fc 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1727,7 +1727,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1238a0d6ce6e..6c642067c4ed 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -51,7 +51,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
 
 static struct mutex symbol_bits_lock;
 
@@ -362,6 +361,13 @@ void symbol__set_annotate2(struct symbol *sym, bool annotate2)
 	mutex_unlock(&symbol_bits_lock);
 }
 
+static void symbol__set_idle(struct symbol *sym, bool idle)
+{
+	mutex_lock(&symbol_bits_lock);
+	sym->idle = idle ? SYMBOL_IDLE__IDLE : SYMBOL_IDLE__NOT_IDLE;
+	mutex_unlock(&symbol_bits_lock);
+}
+
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -375,8 +381,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -384,17 +389,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		sym->idle = symbol__is_idle(name);
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -411,7 +405,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -572,7 +566,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -738,43 +732,81 @@ int modules__parse(const char *filename, void *arg,
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+static int sym_name_cmp(const void *a, const void *b)
 {
-	const char * const idle_symbols[] = {
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
+{
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (sym->idle)
+		return sym->idle == SYMBOL_IDLE__IDLE;
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		symbol__set_idle(sym, /*idle=*/false);
+		return false;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
 
-	return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
+
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
+
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
+
+	symbol__set_idle(sym, /*idle=*/false);
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -803,7 +835,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 5d98d7e84d57..717d2f876d58 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -43,6 +43,12 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -58,8 +64,8 @@ struct symbol {
 	u8		type:4;
 	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
 	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
+	/** Cache for symbol__is_idle holding enum symbol_idle_kind values. */
+	u8		idle:2;
 	/** Resolvable but tools ignore it (e.g. idle routines). */
 	u8		ignore:1;
 	/** Symbol for an inlined function. */
@@ -197,8 +203,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
@@ -281,5 +286,6 @@ enum {
 };
 
 int symbol__validate_sym_arguments(void);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 #endif /* __PERF_SYMBOL */
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch
  2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  7:11                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:11 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The commit changes the visibility and signature of `machine__is()` (making it static and changing its second parameter from `const char *` to `uint16_t e_machine`), but fails to update callers in `tools/perf/util/symbol.c` (which still pass a string like `"x86_64"`), causing a build breakage.
--

commit 43e3a53fa61bf39d387266ed813f417208baa8e5
Author: Ian Rogers <irogers@google.com>

perf machine: Use perf_env e_machine rather than arch

This commit modifies machine__is to use ELF machine constant comparisons
instead of string comparisons and reduces its scope by making it static. It
updates callers within machine.c.

> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
[ ... ]
> @@ -1611,10 +1611,15 @@ static bool machine__uses_kcore(struct machine *machine)
>  	return dsos__for_each_dso(&machine->dsos, machine__uses_kcore_cb, NULL) != 0 ? true : false;
>  }
>  
> +static bool machine__is(struct machine *machine, uint16_t e_machine)
> +{
> +	return machine && perf_env__e_machine(machine->env, NULL) == e_machine;
> +}
> +

Does changing machine__is() to be static and altering its signature break
the build?

It appears callers in tools/perf/util/symbol.c still pass a string
argument like "x86_64" to this function.

I notice this is addressed later in the series by commit ae1a01cb2104
(perf symbol: Avoid use of machine__is), but leaving this intermediate
state might cause bisection failures.

[ ... ]
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -227,8 +227,6 @@ static inline bool machine__is_host(struct machine *machine)
>  }
>  
>  bool machine__is_lock_function(struct machine *machine, u64 addr);
> -bool machine__is(struct machine *machine, const char *arch);
> -bool machine__normalized_is(struct machine *machine, const char *arch);
>  int machine__nr_cpus_avail(struct machine *machine);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=5

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 08/17] perf symbol: Avoid use of machine__is
  2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
@ 2026-05-02  7:17                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:17 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] machine_or_dso_e_machine() incorrectly prioritizes the globally cached environment architecture over the DSO, and erroneously returns the host architecture (EM_HOST) for kernel traces.
--

commit ae1a01cb2104e12d0b084739b319fb08570b49d8
Author: Ian Rogers <irogers@google.com>

perf symbol: Avoid use of machine__is

This commit switches from using machine__is to checking the ELF machine
type from either the DSO or the running machine's environment. This aims
to use a more accurate architecture representation and avoid imprecise
string comparisons.

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index fcaeeddbbb6b..8aaaab0ad4b7 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
[ ... ]
> @@ -851,6 +851,24 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
>  	return count;
>  }
>  
> +static uint16_t machine_or_dso_e_machine(struct machine *machine, struct dso *dso)
> +{
> +	uint16_t e_machine = EM_NONE;
> +
> +	/* Check for a cached value first. */
> +	if (machine && machine->env && machine->env->e_machine != EM_NONE)
> +		return machine->env->e_machine;
> +
> +	/* DSO should be most accurate */

Does checking the globally cached environment architecture before the DSO
contradict the comment that the DSO should be most accurate?

If the environment architecture is already cached, won't this bypass the DSO
entirely and potentially break analysis of mixed-architecture user binaries?

> +	if (dso)
> +		e_machine = dso__e_machine(dso, machine, /*e_flags=*/NULL);
> +
> +	if (e_machine != EM_NONE)
> +		return e_machine;

When dso__e_machine() is called for kernel DSOs, does it return the host
architecture (EM_HOST) of the machine running the perf tool?

If so, would returning this host architecture skip the fallback to
perf_env__e_machine(), causing cross-architecture analysis (like reading
an x86_64 trace on an ARM64 host) to fail checks like e_machine == EM_X86_64?

> +
> +	return perf_env__e_machine(machine ? machine->env : NULL, /*e_flags=*/NULL);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=8

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch
  2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  7:44                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:44 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Build failure on older toolchains due to missing fallback definition for `EM_AARCH64`.
--

commit 5745fb586646dd63f1bc0598424b6a0b707af964
Author: Ian Rogers <irogers@google.com>

perf c2c: Use perf_env e_machine rather than arch
    
This commit updates perf c2c to use the e_machine value from the perf_env
environment instead of performing string matching on the architecture name
for AARCH64 detection.

> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 72a7802775ee..09c8352a922c 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -3202,7 +3202,7 @@ static int perf_c2c__report(int argc, const char **argv)
>  	 * default display type.
>  	 */
>  	if (!display) {
> -		if (!strcmp(perf_env__arch(env), "arm64"))
> +		if (perf_env__e_machine(env, /*e_flags=*/NULL) == EM_AARCH64)

Will this cause a build failure on older toolchains where the system <elf.h>
does not yet define EM_AARCH64?

On older systems (for example, those with glibc earlier than 2.17),
EM_AARCH64 might not be declared. Other files in the perf codebase provide
a manual fallback definition for it, but builtin-c2c.c does not appear to
include one.

>  			display = "peer";
>  		else
>  			display = "tot";

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=11

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release
  2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-02  7:53                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:53 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Incomplete refactoring: The JSON data converter was missed and does not use the new lazy initialization helper.
- [Medium] Context leak: Assuming uninitialized `os_release` always implies live mode incorrectly pollutes offline trace conversions with the host's kernel version.
--

commit a08c8b8c77a407cf0d562bc6f755d7e0c00f4156
Author: Ian Rogers <irogers@google.com>

    perf env: Add helper to lazily compute the os_release

This commit introduces a helper function, perf_env__os_release(), to lazily
initialize the OS release string when it is not populated. It updates the CTF
data converter and other sites to use this helper. The helper assumes that if
os_release isn't initialized, perf is running in live mode and uses uname()
as a fallback.

> diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
> index 3b8f2df823a9..2c88420fe33e 100644
> --- a/tools/perf/util/data-convert-bt.c
> +++ b/tools/perf/util/data-convert-bt.c
> @@ -1414,7 +1414,7 @@ do {									\
>  
>  	ADD("host",    env->hostname);
>  	ADD("sysname", "Linux");
> -	ADD("release", env->os_release);
> +	ADD("release", perf_env__os_release(env));

Was the JSON data converter missed in this refactoring?

The CTF converter was updated to use the new helper, but the JSON converter
in tools/perf/util/data-convert-json.c still accesses the field directly:

tools/perf/util/data-convert-json.c:
    ...
    output_json_key_string(out, true, 2, "hostname", env->hostname);
    output_json_key_string(out, true, 2, "os-release", env->os_release);
    output_json_key_string(out, true, 2, "arch", env->arch);
    ...

This leads to inconsistent behavior where the JSON converter outputs an empty
string or null if os_release is uninitialized, while the CTF converter
outputs the host's release string.

>  	ADD("version", env->version);
>  	ADD("machine", env->arch);
>  	ADD("domain", "kernel");

[ ... ]

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 5944acd28996..1090aaa2985f 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -339,6 +339,27 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
>  	return env->kernel_is_64_bit;
>  }
>  
> +const char *perf_env__os_release(struct perf_env *env)
> +{
> +	struct utsname uts;
> +	int ret;
> +
> +	if (!env)
> +		return perf_version_string;
> +
> +	if (env->os_release)
> +		return env->os_release;
> +
> +	/*
> +	 * The os_release is being accessed but wasn't initialized from a data
> +	 * file, assume this is 'live' mode and use the release from uname. If
> +	 * uname or strdup fails then use the current perf tool version.
> +	 */
> +	ret = uname(&uts);
> +	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
> +	return env->os_release ?: perf_version_string;
> +}

Does this fallback to uname() incorrectly pollute offline trace conversions
with the host's kernel version?

If a perf.data file was recorded without the HEADER_OSRELEASE feature,
env->os_release will be NULL.

When running perf data convert --to-ctf on such a file, this helper will
execute uname() and write the host machine's OS release into the trace
environment.

Prior to this patch, a NULL os_release was safely omitted from the CTF
export.

Injecting the converter's host kernel version into offline trace metadata
falsifies the environment in which the trace was originally recorded.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=15

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check
  2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
@ 2026-05-02  7:55                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:55 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Incorrect fallback to host architecture for guest machines breaks cross-architecture disassembly.
- [Low] Missing `#ifndef` fallback for `EM_AARCH64` will cause compilation failures on systems with older glibc.
--

commit 3c6ce70c4a5c93a1c064a25ccf73aca73925cf03
Author: Ian Rogers <irogers@google.com>

perf print_insn: Use e_machine for fallback IP length check

This commit updates the instruction printer to use the ELF machine type
instead of string comparisons on the perf_env architecture. This aims to
provide a more precise fallback check for instruction pointer length.

> --- a/tools/perf/util/print_insn.c
> +++ b/tools/perf/util/print_insn.c
> @@ -17,6 +18,7 @@
>  #include "dump-insn.h"
>  #include "map.h"
>  #include "dso.h"
> +#include <elf.h>

This isn't a bug, but could compiling on older systems fail here? Other
files in the perf codebase, such as tools/perf/util/symbol-elf.c,
provide an explicit fallback definition for EM_AARCH64 since older
system headers might not include it.

[ ... ]

> @@ -33,13 +35,13 @@ size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
>  static bool is64bitip(struct machine *machine, struct addr_location *al)
>  {
>  	const struct dso *dso = al->map ? map__dso(al->map) : NULL;
> +	uint16_t e_machine;
>  
>  	if (dso)
>  		return dso__is_64_bit(dso);
>  
> -	return machine__is(machine, "x86_64") ||
> -		machine__normalized_is(machine, "arm64") ||
> -		machine__normalized_is(machine, "s390");
> +	e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);

When processing traces from guest machines, machine->env is initialized
as NULL. If env is NULL, perf_env__e_machine_nocache() defaults to
returning EM_HOST, the compile-time ELF architecture of the perf tool.

If a 64-bit host analyzes a trace from a 32-bit guest, would e_machine
evaluate to EM_X86_64 (or similar) instead of the guest's actual
architecture?

The previous string comparison logic safely returned false when env was
NULL.

> +	return e_machine == EM_X86_64 || e_machine == EM_AARCH64 || e_machine == EM_S390;

Will this regression cause Capstone to disassemble 32-bit guest
instructions using the 64-bit decoder and yield garbage output for
those guests?

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=4

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-02  7:55                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:55 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Unconditional acquisition of a global mutex in high-frequency per-sample fast paths causes severe performance regression.
- [High] Unlocked reads of bitfields concurrent with locked writes cause data races and undefined behavior.
- [Medium] The `symbol_bits_lock` mutex is leaked and potentially re-initialized if `symbol__init()` fails.
--

commit 587302f6555ca9ddcc54f4761c10004005eaa78f
Author: Ian Rogers <irogers@google.com>

perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues

This commit introduces a global symbol_bits_lock and helper functions
to serialize updates to the ignore and annotate2 bitfields in struct symbol.
This aims to prevent lost updates caused by concurrent modifications to
bitfields sharing the same byte.

> diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
> index 9d3a4c779a41..7337ee956dc9 100644
> --- a/tools/perf/builtin-kwork.c
> +++ b/tools/perf/builtin-kwork.c
[ ... ]
> @@ -725,7 +725,7 @@ static void timehist_save_callchain(struct perf_kwork *kwork,
>  		if (sym) {
>  			if (!strcmp(sym->name, "__softirqentry_text_start") ||
>  			    !strcmp(sym->name, "__do_softirq"))
> -				sym->ignore = 1;
> +				symbol__set_ignore(sym, true);
>  		}
>  
>  		callchain_cursor_advance(cursor);

Since timehist_save_callchain is a per-sample fast path that can be called
millions of times per run, could this unconditional acquisition of a global
mutex cause severe lock contention and performance regression?

Would it be better to add a lockless check like if (!sym->ignore) to bypass
the lock once the flag is already set?

> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 555247568e7a..655e95f660c2 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
[ ... ]
> @@ -2371,7 +2371,7 @@ static void save_task_callchain(struct perf_sched *sched,
>  			if (!strcmp(sym->name, "schedule") ||
>  			    !strcmp(sym->name, "__schedule") ||
>  			    !strcmp(sym->name, "preempt_schedule"))
> -				sym->ignore = 1;
> +				symbol__set_ignore(sym, true);
>  		}

Could the same fast-path lock contention happen here during
save_task_callchain?

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index a70066d17729..1238a0d6ce6e 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
[ ... ]
> @@ -345,6 +348,20 @@ void symbol__delete(struct symbol *sym)
>  	free(((void *)sym) - symbol_conf.priv_size);
>  }
>  
> +void symbol__set_ignore(struct symbol *sym, bool ignore)
> +{
> +	mutex_lock(&symbol_bits_lock);
> +	sym->ignore = ignore;
> +	mutex_unlock(&symbol_bits_lock);
> +}
> +
> +void symbol__set_annotate2(struct symbol *sym, bool annotate2)
> +{
> +	mutex_lock(&symbol_bits_lock);
> +	sym->annotate2 = annotate2;
> +	mutex_unlock(&symbol_bits_lock);
> +}

While the new symbol_bits_lock serializes writes to these bitfields, readers
of sym->ignore and sym->annotate2 remain unlocked throughout the codebase.

In C, adjacent bitfields share the same underlying memory location. Under the
C11 memory model, does reading a bitfield without synchronization while another
thread performs a read-modify-write on that same memory location constitute a
data race?

Could this lead to torn reads or undefined behavior during concurrent access?

> @@ -2415,6 +2432,8 @@ int symbol__init(struct perf_env *env)
>  	if (symbol_conf.initialized)
>  		return 0;
>  
> +	mutex_init(&symbol_bits_lock);
> +
>  	symbol_conf.priv_size = PERF_ALIGN(symbol_conf.priv_size, sizeof(u64));
>  
>  	symbol__elf_init();

If a subsequent initialization step fails, like setup_parallelism_bitmap,
this function returns -1 without calling mutex_destroy on the error paths.

Because symbol_conf.initialized remains false, a subsequent call to
symbol__exit will also return early and fail to clean the mutex up.

Additionally, if a caller attempts to retry symbol__init after a failure,
could this cause mutex_init to be invoked on an already-initialized mutex?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=16

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-02  7:56                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:56 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Uninitialized stack memory is leaked to the perf.data file and cached in the environment.
- [High] Lossy canonicalization of the architecture string breaks 64-bit detection for old and new perf.data files.
- [Medium] If strdup() fails during process_arch(), env->arch is silently left as NULL.
--

commit 8073d9a9766717469f9a62174c92bb53c77a0df4
Author: Ian Rogers <irogers@google.com>

perf env: Add perf_env__e_machine helper and use in perf_env__arch

This commit introduces a helper to lazily compute the e_machine and update the
perf_env's arch calculation. It updates the session to use the new helper and
modifies how the perf data file header extracts the e_machine and e_flags.

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 1e54e2c86360..4ff4caab3b32 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
[ ... ]
> +uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags)
> +{
> +	uint16_t e_machine = EM_HOST;
> +
> +	if (env)
> +		e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
> +
> +	if (e_flags && e_machine == EM_HOST)
> +		*e_flags = EF_HOST;
> +
> +	return e_machine;
> +}
> +
> +uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
> +{
> +	uint16_t e_machine;
> +	uint32_t local_e_flags;

Could local_e_flags be used uninitialized here?

If e_machine != EM_HOST in perf_env__e_machine_nocache(), *e_flags is not
written to, which leaves local_e_flags uninitialized.

> +
> +	if (env && env->e_machine != EM_NONE) {
> +		if (e_flags)
> +			*e_flags = env->e_flags;
> +
> +		return env->e_machine;
> +	}
> +	e_machine = perf_env__e_machine_nocache(env, &local_e_flags);
> +	if (env) {
> +		env->e_machine = e_machine;
> +		env->e_flags = local_e_flags;

If local_e_flags is uninitialized, does this leak stack memory into the
environment?

> +	}
> +	if (e_flags)
> +		*e_flags = local_e_flags;
> +
> +	return e_machine;
>  }
[ ... ]
>  const char *perf_env__arch(struct perf_env *env)
>  {
[ ... ]
> +	/*
> +	 * Lazily compute/allocate arch. The e_machine may have been
> +	 * read from a data file and so may not be EM_HOST.
> +	 */
> +	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
> +	arch = e_machine_to_perf_arch(e_machine);
> +	env->arch = strdup(arch);

If strdup() fails due to memory exhaustion, could env->arch be left as NULL?

> +	/*
> +	 * Avoid potential crashes on the arch string if memory allocation in
> +	 * strdup fails and NULL were to be returned.
> +	 */
> +	return env->arch ?: arch;
>  }
>  
>  #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index f30e48eb3fc3..8d5152bde25d 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
[ ... ]
> -static int write_e_machine(struct feat_fd *ff,
> -			   struct evlist *evlist __maybe_unused)
> +static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
>  {
>  	/* e_machine expanded from 16 to 32-bits for alignment. */
>  	uint32_t e_flags;

Could e_flags be leaked to the perf.data file here?

If no threads supply a valid ELF machine and the fallback logic invokes
perf_env__e_machine_nocache() returning e_machine != EM_HOST, e_flags
may remain uninitialized stack garbage.

> -	uint32_t e_machine = EM_HOST;
> +	uint32_t e_machine = perf_session__e_machine(evlist->session, &e_flags);
>  	int ret;
>  
>  	ret = do_write(ff, &e_machine, sizeof(e_machine));
>  	if (ret)
>  		return ret;
>  
> -	return do_write(ff, &e_machine, sizeof(e_machine));
> +	return do_write(ff, &e_flags, sizeof(e_flags));

Does this write the potentially uninitialized stack memory to the file header?

>  }
[ ... ]
> +static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
> +{
> +	uint16_t saved_e_machine = ff->ph->env.e_machine;
> +
> +	free(ff->ph->env.arch);
> +	ff->ph->env.arch = do_read_string(ff);
> +	if (!ff->ph->env.arch)
> +		return -ENOMEM;
> +	/*
> +	 * Make the arch string canonical by computing the e_machine from it,
> +	 * then turning the e_machine back into an arch string.
> +	 */
> +	ff->ph->env.e_machine = EM_NONE;
> +	if (perf_env__e_machine(&ff->ph->env, /*e_flags=*/NULL) != EM_NONE) {
> +		zfree(&ff->ph->env.arch);
> +		perf_env__arch(&ff->ph->env);

If strdup() failed inside perf_env__arch(), could ff->ph->env.arch be left
as NULL, while this function incorrectly returns 0 (success) leading to
later NULL pointer dereferences?

Additionally, does this canonicalization strip architecture suffixes like "64"?

For legacy perf.data files lacking the HEADER_E_MACHINE feature,
saved_e_machine would be restored to EM_NONE. If the string is canonicalized
from "x86_64" to "x86" (based on e_machine_to_perf_arch()), subsequent calls to
perf_env__init_kernel_mode() will check str_ends_with(env->arch, "64"). Because
it's now "x86", the kernel_is_64_bit flag will be set to 0.

Does this break 64-bit detection and parsing for 64-bit recordings?

> +	}
> +	ff->ph->env.e_machine = saved_e_machine;
> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 03/17] perf capstone: Determine architecture from e_machine
  2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
@ 2026-05-02  7:58                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:58 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Hardcoded endianness in e_machine_to_capstone() breaks disassembly for bi-endian architectures.
- [High] thread__e_machine() fallback to EM_HOST breaks cross-architecture disassembly for kernel code and BPF programs.
--

commit a3ff746537f2f9677ae9d6427555ff27588e61a4
Author: Ian Rogers <irogers@google.com>
perf capstone: Determine architecture from e_machine

This commit refactors capstone initialization to use the e_machine value
from the ELF header instead of the architecture string, adding support
for MIPS and RISCV translations.

Note: The format of this review has been altered to provide a free-form
summary of the findings without directly quoting the patch code, due to
recitation restrictions.

The new e_machine_to_capstone() function hardcodes CS_MODE_BIG_ENDIAN for
EM_PPC, EM_PPC64, and EM_MIPS, and implicitly assumes little-endian for
EM_AARCH64 and EM_ARM.

Does this approach break disassembly for bi-endian architectures?

Linux fully supports variants like ppc64le, mipsel, armeb, and aarch64_be
under these same e_machine IDs. Because the initialization now succeeds
with the hardcoded endianness rather than returning -1, it actively bypasses
the functional fallback to objdump. Since objdump natively reads EI_DATA
from the ELF header to determine the correct endianness, skipping it leaves
these architectures with incorrect disassembly output.

Could we read EI_DATA alongside e_machine to set the endianness mode correctly?

Additionally, the updated code replaces machine__normalized_is() with
thread__e_machine() to determine the target architecture.

Will this break cross-architecture disassembly for kernel code and BPF
programs?

Previously, machine__normalized_is() correctly determined the architecture
from machine->env->arch, which is populated by the perf.data file header.
The new approach searches DSOs and falls back to dso__e_machine(), which
is hardcoded to return EM_HOST (the compile-time host architecture of the
perf binary) for kernel components like DSO_BINARY_TYPE__KALLSYMS,
DSO_BINARY_TYPE__VMLINUX, and DSO_BINARY_TYPE__BPF_PROG_INFO.

If a user performs cross-architecture analysis, such as analyzing an ARM64
perf.data file on an x86_64 host, won't thread__e_machine() return
EM_X86_64? This would cause Capstone to decode ARM64 kernel or JIT
instructions as x86_64 instructions.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=3

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch
  2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-02  7:59                             ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-02  7:59 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Logic inversion in `perf_env__single_address_space` breaks kernel/user address space resolution.
- [Medium] Exact `EM_HOST` check breaks native binutils fallback for 32-bit profiling data on 64-bit hosts.
--

commit e2d5820d159463d4dd20c88669d8e471fefbb4c5
Author: Ian Rogers <irogers@google.com>

perf arch common: Use perf_env e_machine rather than arch

This patch replaces architecture string matching with ELF machine
constants in the perf architecture common code. It updates the
binutils path lookup and single address space checks to use
the e_machine values from the perf environment.

> diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
> index 21836f70f231..e9b5b61feffe 100644
> --- a/tools/perf/arch/common.c
> +++ b/tools/perf/arch/common.c

[ ... ]

> @@ -145,7 +147,8 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
>  					  const char *name, char **path)
>  {

[ ... ]

>  	/*
>  	 * We don't need to try to find objdump path for native system.
>  	 * Just use default binutils path (e.g.: "objdump").
>  	 */
> -	if (!strcmp(perf_env__arch(NULL), arch))
> +	if (e_machine == EM_HOST)
>  		goto out;

The previous check used perf_env__arch() which normalized both 32-bit and
64-bit variants to a single string like "x86". With the exact EM_HOST
comparison, if we are analyzing a 32-bit trace (EM_386) on a 64-bit host
(EM_X86_64), will this evaluate to false and skip the native fallback?

Could this regression cause the tool to wrongly search for a cross-compiler
prefix instead of successfully using the native multi-arch objdump?

[ ... ]

> @@ -237,5 +252,5 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
>   */
>  bool perf_env__single_address_space(struct perf_env *env)
>  {
> -	return strcmp(perf_env__arch(env), "sparc");
> +	return perf_env__e_machine(env, /*e_flags=*/NULL) == EM_SPARC;
>  }

Does this introduce a logic inversion regression?

The original implementation using strcmp() evaluated to non-zero (true)
for non-SPARC architectures and zero (false) for SPARC. The new
implementation returns true only for SPARC and false for everything else.

Also, should this check include EM_SPARCV9 to properly handle 64-bit SPARC
traces, since the previous strcmp() against "sparc" would have matched the
normalized string for both?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260502065935.291960-1-irogers@google.com?part=9

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation
  2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                             ` (16 preceding siblings ...)
  2026-05-02  6:59                           ` [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
@ 2026-05-03  0:22                           ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
                                               ` (17 more replies)
  17 siblings, 18 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper to perf_env to compute the e_machine if it is EM_NONE.
Derive the value from the arch string if available. Similarly derive
the arch string from the ELF machine if available, for
consistency. This means perf's arch (machine type) is no longer
determined by uname but set to match that of the perf ELF executable.

Migrate code away from strcmp on env->arch to using the e_machine
comparisons that are more accurate and not prone to uname and other
naming differences. While cleaning this up, also clean up the capstone
initialization code to cover more architectures and to set the big
endian flag based on ELF header information.

Switch the idle computation to the point of use and lazily compute it,
rather than computing it for every symbol. The current only user is
`perf top`. At the point of use the perf_env is available and this can
be used to make sure the idle function computation is machine and
kernel version dependent.

To avoid concurrent update issues with bitfields sharing a byte in
`struct symbol` switch to using C11 atomics.

v9:
 - Key changes in v9:
   - **C11 Atomics for `struct symbol`**: Dropped the global
     `symbol_bits_lock` introduced in v7/v8. Replaced unsafe bitfields
     with a thread-safe `_Atomic uint16_t flags` and lockless atomic
     helpers (e.g., `symbol__type()`, `symbol__set_inlined()`).
   - **Bi-endianness Support**: Added `*_endian` variants for `dso` and
     `thread` helpers to ensure Capstone correctly disassembles cross-endian
     binaries.
   - **Architecture Hardening**:
     - Fixed inverted SPARC logic in `perf_env__single_address_space()`.
     - Prioritized DSO architecture over global environment in
       `machine_or_dso_e_machine()`.
     - Fixed an uninitialized memory leak in `perf_env__e_machine()`.
     - Removed lossy `normalize_arch()` canonicalization in `process_arch()`.

 - Review Feedback Status:
   - **Addressed**: C11 atomics migration, bi-endianness, SPARC logic,
     DSO prioritization, and uninitialized memory fixes.
   - **Not Addressed / Dropped**:
     - Patch 15 OS Release: The concern regarding the `uname()` fallback
       during offline analysis was determined to be incorrect for these
       uninitialized states; the original lazy assumption is retained.
     - Patch 04/11: The `EM_AARCH64` fallbacks were dropped as the
       definition should come from dwarf-regs.h when necessary.

v8:
 - Address Sashiko AI review feedback for Patch 1:
   - Switch all code dependent on the arch string to use `e_machine`
     instead.
   - Update `machine__is` and `machine__normalized_is` to take
     `e_machine` integers instead of strings.
   - Refactor `arch_syscalls__strerrno_function` to take an `e_machine`.
   - Avoid premature caching of the host architecture in
     `perf_session__e_machine`.

v7:
 - Address better handling of strdup failures with arch in the
   header/env.
 - Address concurrent update issues in `struct symbol` bitfields by
   introducing a global lock for writes.

v6: Ensure arch is canonical by going to e_machine and back (Sashiko)
v5: Add perf_env os_release helper (Namhyung/Sashiko)
v4: Fix Sashiko issues where an array element wasn't sorted properly,
    the e_flags weren't returned properly, the idle type is change to
    a u8 rather than an enum value and the s390 version check for
    psw_idle is slightly reordered and tweaked.
v3: Properly set up the e_machine coming from the perf_env as reported
    by Honglei Wang.
v2: Some minor white space clean up.
v1: Initial release.

Ian Rogers (18):
  perf env: Add perf_env__e_machine helper and use in perf_env__arch
  perf tests topology: Switch env->arch use to env->e_machine
  perf env, dso, thread: Add _endian variants for e_machine helpers
  perf capstone: Determine architecture from e_machine
  perf print_insn: Use e_machine for fallback IP length check
  perf symbol: Avoid use of machine__is
  perf machine: Use perf_env e_machine rather than arch
  perf sample-raw: Use perf_env e_machine rather than arch
  perf sort: Use perf_env e_machine rather than arch
  perf arch common: Use perf_env e_machine rather than arch
  perf header: In print_pmu_caps use perf_env e_machine
  perf c2c: Use perf_env e_machine rather than arch
  perf lock-contention: Use perf_env e_machine rather than arch
  perf env: Refactor perf_env__arch_strerrno
  perf env: Remove unused perf_env__raw_arch
  perf env: Add helper to lazily compute the os_release
  perf symbol: Add setters for bitfields sharing a byte to avoid
    concurrent update issues
  perf symbol: Lazily compute idle

 tools/perf/arch/common.c                      |  62 ++--
 tools/perf/builtin-c2c.c                      |  40 +--
 tools/perf/builtin-inject.c                   |   6 +-
 tools/perf/builtin-kwork.c                    |   2 +-
 tools/perf/builtin-report.c                   |   2 +-
 tools/perf/builtin-sched.c                    |   4 +-
 tools/perf/builtin-top.c                      |   7 +-
 tools/perf/builtin-trace.c                    |   7 +-
 tools/perf/tests/symbols.c                    |   2 +-
 tools/perf/tests/topology.c                   |   8 +-
 tools/perf/tests/vmlinux-kallsyms.c           |   2 +-
 tools/perf/trace/beauty/arch_errno_names.sh   |  40 ++-
 tools/perf/ui/browsers/annotate.c             |   2 +-
 tools/perf/ui/browsers/map.c                  |   4 +-
 tools/perf/util/annotate.c                    |   5 +-
 tools/perf/util/auxtrace.c                    |   6 +-
 tools/perf/util/callchain.c                   |   4 +-
 tools/perf/util/capstone.c                    | 129 +++++---
 tools/perf/util/data-convert-bt.c             |   2 +-
 tools/perf/util/dlfilter.c                    |   2 +-
 tools/perf/util/dso.c                         |  19 +-
 tools/perf/util/dso.h                         |  14 +-
 tools/perf/util/env.c                         | 295 ++++++++++++++----
 tools/perf/util/env.h                         |  12 +-
 tools/perf/util/evsel_fprintf.c               |   6 +-
 tools/perf/util/header.c                      |  58 ++--
 tools/perf/util/intel-pt.c                    |   2 +-
 tools/perf/util/lock-contention.c             |   6 +-
 tools/perf/util/machine.c                     |  27 +-
 tools/perf/util/machine.h                     |   2 -
 tools/perf/util/print_insn.c                  |  23 +-
 tools/perf/util/print_insn.h                  |   3 +
 tools/perf/util/probe-event.c                 |   4 +-
 tools/perf/util/sample-raw.c                  |  21 +-
 tools/perf/util/sample-raw.h                  |   6 +-
 .../util/scripting-engines/trace-event-perl.c |   2 +-
 .../scripting-engines/trace-event-python.c    |   4 +-
 tools/perf/util/session.c                     |  26 +-
 tools/perf/util/sort.c                        |  66 ++--
 tools/perf/util/srcline.c                     |  10 +-
 tools/perf/util/symbol-elf.c                  |   5 +-
 tools/perf/util/symbol.c                      | 208 ++++++++----
 tools/perf/util/symbol.h                      |  74 ++++-
 tools/perf/util/symbol_fprintf.c              |   4 +-
 tools/perf/util/thread.c                      |  22 +-
 tools/perf/util/thread.h                      |   8 +-
 46 files changed, 860 insertions(+), 403 deletions(-)

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:52                               ` sashiko-bot
  2026-05-04  1:35                               ` Namhyung Kim
  2026-05-03  0:22                             ` [PATCH v9 02/18] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
                                               ` (16 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add a helper that lazily computes the e_machine and falls back to
EM_HOST. Use the perf_env's arch to compute the e_machine if
available. Use a binary search for some efficiency in this, but handle
somewhat complex duplicate rules. Switch perf_env__arch to be derived
the e_machine for consistency. This switches arch from being uname
derived to matching that of the perf binary (via EM_HOST). Update
session to use the helper, which may mean using EM_HOST when no
threads are available. This also updates the perf data file header
that gets the e_machine/e_flags from the session.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c     | 231 +++++++++++++++++++++++++++++++-------
 tools/perf/util/env.h     |   2 +
 tools/perf/util/header.c  |  35 ++++--
 tools/perf/util/session.c |  26 +++--
 4 files changed, 231 insertions(+), 63 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1e54e2c86360..0edc67a468ab 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "cpumap.h"
+#include "dwarf-regs.h"
 #include "debug.h"
 #include "env.h"
 #include "util/header.h"
 #include "util/rwsem.h"
 #include <linux/compiler.h>
+#include <linux/kernel.h>
 #include <linux/ctype.h>
 #include <linux/rbtree.h>
 #include <linux/string.h>
@@ -309,12 +311,21 @@ void perf_env__init(struct perf_env *env)
 
 static void perf_env__init_kernel_mode(struct perf_env *env)
 {
-	const char *arch = perf_env__raw_arch(env);
+	uint16_t e_machine = env->e_machine;
 
-	if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
-	    !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
-	    !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
-	    !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
+	if (env->arch && (e_machine == EM_NONE || e_machine == EM_MIPS || e_machine == EM_RISCV)) {
+		if (str_ends_with(env->arch, "64") || !strncmp(env->arch, "s390x", 5))
+			env->kernel_is_64_bit = 1;
+		else
+			env->kernel_is_64_bit = 0;
+		return;
+	}
+	if (e_machine == EM_NONE)
+		e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	if (e_machine == EM_X86_64 || e_machine == EM_AARCH64 ||
+	    e_machine == EM_PPC64 || e_machine == EM_SPARCV9 ||
+	    e_machine == EM_S390)
 		env->kernel_is_64_bit = 1;
 	else
 		env->kernel_is_64_bit = 0;
@@ -588,51 +599,187 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
 	zfree(&cache->size);
 }
 
+struct arch_to_e_machine {
+	const char *prefix;
+	uint16_t e_machine;
+};
+
 /*
- * Return architecture name in a normalized form.
- * The conversion logic comes from the Makefile.
+ * A mapping from an arch prefix string to an ELF machine that can be used in a
+ * bsearch. Some arch prefixes are shared an need additional processing as
+ * marked next to the architecture. The prefixes handle both perf's architecture
+ * naming and those from uname.
  */
-static const char *normalize_arch(char *arch)
-{
-	if (!strcmp(arch, "x86_64"))
-		return "x86";
-	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-		return "x86";
-	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-		return "sparc";
-	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
-		return "arm64";
-	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-		return "arm";
-	if (!strncmp(arch, "s390", 4))
-		return "s390";
-	if (!strncmp(arch, "parisc", 6))
-		return "parisc";
-	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-		return "powerpc";
-	if (!strncmp(arch, "mips", 4))
-		return "mips";
-	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-		return "sh";
-	if (!strncmp(arch, "loongarch", 9))
-		return "loongarch";
-
-	return arch;
+static const struct arch_to_e_machine prefix_to_e_machine[] = {
+	{"aarch64", EM_AARCH64},
+	{"alpha", EM_ALPHA},
+	{"arc", EM_ARC},
+	{"arm", EM_ARM}, /* Check also for EM_AARCH64. */
+	{"avr", EM_AVR},  /* Check also for EM_AVR32. */
+	{"bfin", EM_BLACKFIN},
+	{"blackfin", EM_BLACKFIN},
+	{"cris", EM_CRIS},
+	{"csky", EM_CSKY},
+	{"hppa", EM_PARISC},
+	{"i386", EM_386},
+	{"i486", EM_386},
+	{"i586", EM_386},
+	{"i686", EM_386},
+	{"loongarch", EM_LOONGARCH},
+	{"m32r", EM_M32R},
+	{"m68k", EM_68K},
+	{"microblaze", EM_MICROBLAZE},
+	{"mips", EM_MIPS},
+	{"msp430", EM_MSP430},
+	{"parisc", EM_PARISC},
+	{"powerpc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"ppc", EM_PPC}, /* Check also for EM_PPC64. */
+	{"riscv", EM_RISCV},
+	{"s390", EM_S390},
+	{"sa110", EM_ARM},
+	{"sh", EM_SH},
+	{"sparc", EM_SPARC}, /* Check also for EM_SPARCV9. */
+	{"sun4u", EM_SPARC},
+	{"x86", EM_X86_64}, /* Check also for EM_386. */
+	{"xtensa", EM_XTENSA},
+};
+
+static int compare_prefix(const void *key, const void *element)
+{
+	const char *search_key = key;
+	const struct arch_to_e_machine *map_element = element;
+	size_t prefix_len = strlen(map_element->prefix);
+
+	return strncmp(search_key, map_element->prefix, prefix_len);
+}
+
+static uint16_t perf_arch_to_e_machine(const char *perf_arch, int is_64_bit)
+{
+	/* Binary search for a matching prefix. */
+	const struct arch_to_e_machine *result;
+
+	if (!perf_arch)
+		return EM_HOST;
+
+	result = bsearch(perf_arch,
+			 prefix_to_e_machine, ARRAY_SIZE(prefix_to_e_machine),
+			 sizeof(prefix_to_e_machine[0]),
+			 compare_prefix);
+
+	if (!result) {
+		pr_debug("Unknown perf arch for ELF machine mapping: %s\n", perf_arch);
+		return EM_NONE;
+	}
+
+	/*
+	 * Handle conflicting prefixes. If the is_64_bit is unknown (-1) then
+	 * assume 64-bit. We can't use perf_env__kernel_is_64_bit as that
+	 * depends on the arch string.
+	 */
+	switch (result->e_machine) {
+	case EM_ARM:
+		return !strcmp(perf_arch, "arm64") ? EM_AARCH64 : EM_ARM;
+	case EM_AVR:
+		return !strcmp(perf_arch, "avr32") ? EM_AVR32 : EM_AVR;
+	case EM_PPC:
+		return (is_64_bit != 0) || strstarts(perf_arch, "ppc64") ? EM_PPC64 : EM_PPC;
+	case EM_SPARC:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "sparc64") ? EM_SPARCV9 : EM_SPARC;
+	case EM_X86_64:
+		return (is_64_bit != 0) || !strcmp(perf_arch, "x86_64") ? EM_X86_64 : EM_386;
+	default:
+		return result->e_machine;
+	}
+}
+
+static const char *e_machine_to_perf_arch(uint16_t e_machine)
+{
+	/*
+	 * Table for if either the perf arch string differs from uname or there
+	 * are >1 ELF machine with the prefix.
+	 */
+	static const struct arch_to_e_machine extras[] = {
+		{"arm64", EM_AARCH64},
+		{"avr32", EM_AVR32},
+		{"powerpc", EM_PPC},
+		{"powerpc", EM_PPC64},
+		{"sparc", EM_SPARCV9},
+		{"x86", EM_386},
+		{"x86", EM_X86_64},
+		{"none", EM_NONE},
+	};
+
+	for (size_t i = 0; i < ARRAY_SIZE(extras); i++) {
+		if (extras[i].e_machine == e_machine)
+			return extras[i].prefix;
+	}
+
+	for (size_t i = 0; i < ARRAY_SIZE(prefix_to_e_machine); i++) {
+		if (prefix_to_e_machine[i].e_machine == e_machine)
+			return prefix_to_e_machine[i].prefix;
+
+	}
+	return "unknown";
+}
+
+uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags)
+{
+	uint16_t e_machine = EM_HOST;
+
+	if (env)
+		e_machine = perf_arch_to_e_machine(env->arch, env->kernel_is_64_bit);
+
+	if (e_flags)
+		*e_flags = (e_machine == EM_HOST) ? EF_HOST : 0;
+
+	return e_machine;
+}
+
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags)
+{
+	uint16_t e_machine;
+	uint32_t local_e_flags = 0;
+
+	if (env && env->e_machine != EM_NONE) {
+		if (e_flags)
+			*e_flags = env->e_flags;
+
+		return env->e_machine;
+	}
+	e_machine = perf_env__e_machine_nocache(env, &local_e_flags);
+	if (env) {
+		env->e_machine = e_machine;
+		env->e_flags = local_e_flags;
+	}
+	if (e_flags)
+		*e_flags = local_e_flags;
+
+	return e_machine;
 }
 
 const char *perf_env__arch(struct perf_env *env)
 {
-	char *arch_name;
+	uint16_t e_machine;
+	const char *arch;
 
-	if (!env || !env->arch) { /* Assume local operation */
-		static struct utsname uts = { .machine[0] = '\0', };
-		if (uts.machine[0] == '\0' && uname(&uts) < 0)
-			return NULL;
-		arch_name = uts.machine;
-	} else
-		arch_name = env->arch;
+	if (!env)
+		return e_machine_to_perf_arch(EM_HOST);
+
+	if (env->arch)
+		return env->arch;
 
-	return normalize_arch(arch_name);
+	/*
+	 * Lazily compute/allocate arch. The e_machine may have been
+	 * read from a data file and so may not be EM_HOST.
+	 */
+	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	arch = e_machine_to_perf_arch(e_machine);
+	env->arch = strdup(arch);
+	/*
+	 * Avoid potential crashes on the arch string if memory allocation in
+	 * strdup fails and NULL were to be returned.
+	 */
+	return env->arch ?: arch;
 }
 
 #if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c7052ac1f856..7151a9138e3f 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -187,6 +187,8 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
+uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags);
+uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(struct perf_env *env, int err);
 const char *perf_env__cpuid(struct perf_env *env);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f30e48eb3fc3..f1ae61392cce 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -379,21 +379,28 @@ static int write_osrelease(struct feat_fd *ff,
 	return do_write_string(ff, uts.release);
 }
 
-static int write_arch(struct feat_fd *ff,
-		      struct evlist *evlist __maybe_unused)
+static int write_arch(struct feat_fd *ff, struct evlist *evlist)
 {
 	struct utsname uts;
-	int ret;
+	const char *arch = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session) {
+		/* Force the computation in the perf_env of the e_machine of the threads. */
+		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
+		arch = perf_env__arch(perf_session__env(evlist->session));
+	}
 
-	return do_write_string(ff, uts.machine);
+	if (!arch) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		arch = uts.machine;
+	}
+	return do_write_string(ff, arch);
 }
 
-static int write_e_machine(struct feat_fd *ff,
-			   struct evlist *evlist __maybe_unused)
+static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
 {
 	/* e_machine expanded from 16 to 32-bits for alignment. */
 	uint32_t e_flags;
@@ -2684,10 +2691,18 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
 FEAT_PROCESS_STR_FUN(hostname, hostname);
 FEAT_PROCESS_STR_FUN(osrelease, os_release);
 FEAT_PROCESS_STR_FUN(version, version);
-FEAT_PROCESS_STR_FUN(arch, arch);
 FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
 FEAT_PROCESS_STR_FUN(cpuid, cpuid);
 
+static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
+{
+	free(ff->ph->env.arch);
+	ff->ph->env.arch = do_read_string(ff);
+	if (!ff->ph->env.arch)
+		return -ENOMEM;
+	return 0;
+}
+
 static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
 {
 	int ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fe0de2a0277f..3e64db2d27c2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -3023,14 +3023,19 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 		return EM_HOST;
 	}
 
+	/*
+	 * Is the env caching an e_machine? If not we want to compute from the
+	 * more accurate threads.
+	 */
 	env = perf_session__env(session);
-	if (env && env->e_machine != EM_NONE) {
-		if (e_flags)
-			*e_flags = env->e_flags;
-
-		return env->e_machine;
-	}
+	if (env && env->e_machine != EM_NONE)
+		return perf_env__e_machine(env, e_flags);
 
+	/*
+	 * Compute from threads, note this is more accurate than
+	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
+	 * mixed 32-bit and 64-bit threads.
+	 */
 	machines__for_each_thread(&session->machines,
 				  perf_session__e_machine_cb,
 				  &args);
@@ -3048,10 +3053,9 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
 
 	/*
 	 * Couldn't determine from the perf_env or current set of
-	 * threads. Default to the host.
+	 * threads. Potentially use logic that uses the arch string otherwise
+	 * default to the host. Don't cache in the perf_env in case later
+	 * threads indicate a better ELF machine type.
 	 */
-	if (e_flags)
-		*e_flags = EF_HOST;
-
-	return EM_HOST;
+	return perf_env__e_machine_nocache(env, e_flags);
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 02/18] perf tests topology: Switch env->arch use to env->e_machine
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 03/18] perf env, dso, thread: Add _endian variants for e_machine helpers Ian Rogers
                                               ` (15 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Some arch string comparisons weren't normalized. Avoid potential
issues with normalized names vs uname values by swtiching to using the
e_machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/topology.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index f54502ebef4b..d4c5c330c679 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -11,6 +11,7 @@
 #include "pmus.h"
 #include "target.h"
 #include <linux/err.h>
+#include <elf.h>
 
 #define TEMPL "/tmp/perf-test-XXXXXX"
 #define DATA_SIZE	10
@@ -74,6 +75,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	struct aggr_cpu_id id;
 	struct perf_cpu cpu;
 	struct perf_env *env;
+	uint16_t e_machine;
 
 	session = perf_session__new(&data, NULL);
 	TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
@@ -101,7 +103,9 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	 *  condition is true (see do_core_id_test in header.c). So always
 	 *  run this test on those platforms.
 	 */
-	if (!env->cpu && strncmp(env->arch, "s390", 4) && strncmp(env->arch, "aarch64", 7))
+	e_machine = perf_env__e_machine(env, NULL);
+
+	if (!env->cpu && e_machine != EM_S390 && e_machine != EM_AARCH64)
 		return TEST_SKIP;
 
 	/*
@@ -110,7 +114,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map)
 	 * physical_package_id will be set to -1. Hence skip this
 	 * test if physical_package_id returns -1 for cpu from perf_cpu_map.
 	 */
-	if (!strncmp(env->arch, "ppc64le", 7)) {
+	if (e_machine == EM_PPC64) {
 		if (cpu__get_socket_id(perf_cpu_map__cpu(map, 0)) == -1)
 			return TEST_SKIP;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 03/18] perf env, dso, thread: Add _endian variants for e_machine helpers
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 02/18] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:39                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 04/18] perf capstone: Determine architecture from e_machine Ian Rogers
                                               ` (14 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Add perf_arch_is_big_endian(), dso__read_e_machine_endian(),
dso__e_machine_endian(), and thread__e_machine_endian() to support
bi-endianness and cross-architecture analysis without breaking the
existing API.

These helpers allow querying the absolute endianness of a DSO or
thread, which is required for tools like Capstone that need to set the
correct disassembly mode.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dso.c    | 19 +++++++++++++------
 tools/perf/util/dso.h    | 14 ++++++++++++--
 tools/perf/util/env.c    | 12 ++++++++++++
 tools/perf/util/env.h    |  1 +
 tools/perf/util/thread.c | 22 ++++++++++++++++------
 tools/perf/util/thread.h |  8 +++++++-
 6 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index b791e1b6b2cf..6439b2a3c898 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1220,7 +1220,8 @@ static enum dso_swap_type dso_swap_type__from_elf_data(unsigned char eidata)
 }
 
 /* Reads e_machine from fd, optionally caching data in dso. */
-uint16_t dso__read_e_machine(struct dso *optional_dso, int fd, uint32_t *e_flags)
+uint16_t dso__read_e_machine_endian(struct dso *optional_dso, int fd, uint32_t *e_flags,
+				    bool *is_big_endian)
 {
 	uint16_t e_machine = EM_NONE;
 	unsigned char e_ident[EI_NIDENT];
@@ -1250,6 +1251,9 @@ uint16_t dso__read_e_machine(struct dso *optional_dso, int fd, uint32_t *e_flags
 	if (swap_type == DSO_SWAP__UNSET)
 		return EM_NONE; // Bad ELF data encoding.
 
+	if (is_big_endian)
+		*is_big_endian = (e_ident[EI_DATA] == ELFDATA2MSB);
+
 	/* Cache the need for swapping. */
 	if (optional_dso) {
 		assert(dso__needs_swap(optional_dso) == DSO_SWAP__UNSET ||
@@ -1288,7 +1292,8 @@ uint16_t dso__read_e_machine(struct dso *optional_dso, int fd, uint32_t *e_flags
 	return e_machine;
 }
 
-uint16_t dso__e_machine(struct dso *dso, struct machine *machine, uint32_t *e_flags)
+uint16_t dso__e_machine_endian(struct dso *dso, struct machine *machine, uint32_t *e_flags,
+			       bool *is_big_endian)
 {
 	uint16_t e_machine = EM_NONE;
 	int fd;
@@ -1308,9 +1313,11 @@ uint16_t dso__e_machine(struct dso *dso, struct machine *machine, uint32_t *e_fl
 	case DSO_BINARY_TYPE__BPF_IMAGE:
 	case DSO_BINARY_TYPE__OOL:
 	case DSO_BINARY_TYPE__JAVA_JIT:
-		if (e_flags)
-			*e_flags = EF_HOST;
-		return EM_HOST;
+		if (is_big_endian) {
+			*is_big_endian = perf_arch_is_big_endian(
+				machine && machine->env ? machine->env->arch : NULL);
+		}
+		return perf_env__e_machine(machine ? machine->env : NULL, e_flags);
 	case DSO_BINARY_TYPE__DEBUGLINK:
 	case DSO_BINARY_TYPE__BUILD_ID_CACHE:
 	case DSO_BINARY_TYPE__BUILD_ID_CACHE_DEBUGINFO:
@@ -1338,7 +1345,7 @@ uint16_t dso__e_machine(struct dso *dso, struct machine *machine, uint32_t *e_fl
 	try_to_open_dso(dso, machine);
 	fd = dso__data(dso)->fd;
 	if (fd >= 0)
-		e_machine = dso__read_e_machine(dso, fd, e_flags);
+		e_machine = dso__read_e_machine_endian(dso, fd, e_flags, is_big_endian);
 	else if (e_flags)
 		*e_flags = 0;
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index ede691e9a249..2916b954a804 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -866,8 +866,18 @@ int dso__data_file_size(struct dso *dso, struct machine *machine);
 off_t dso__data_size(struct dso *dso, struct machine *machine);
 ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 			      u64 offset, u8 *data, ssize_t size);
-uint16_t dso__read_e_machine(struct dso *optional_dso, int fd, uint32_t *e_flags);
-uint16_t dso__e_machine(struct dso *dso, struct machine *machine, uint32_t *e_flags);
+uint16_t dso__read_e_machine_endian(struct dso *optional_dso, int fd, uint32_t *e_flags,
+				    bool *is_big_endian);
+static inline uint16_t dso__read_e_machine(struct dso *optional_dso, int fd, uint32_t *e_flags)
+{
+	return dso__read_e_machine_endian(optional_dso, fd, e_flags, NULL);
+}
+uint16_t dso__e_machine_endian(struct dso *dso, struct machine *machine, uint32_t *e_flags,
+			       bool *is_big_endian);
+static inline uint16_t dso__e_machine(struct dso *dso, struct machine *machine, uint32_t *e_flags)
+{
+	return dso__e_machine_endian(dso, machine, e_flags, NULL);
+}
 ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
 			    struct machine *machine, u64 addr,
 			    u8 *data, ssize_t size);
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 0edc67a468ab..1a4db133262b 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -339,6 +339,18 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
 	return env->kernel_is_64_bit;
 }
 
+bool perf_arch_is_big_endian(const char *arch)
+{
+	if (!arch)
+		return __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__;
+
+	if (str_ends_with(arch, "_be") || !strcmp(arch, "sparc") || !strcmp(arch, "sparc64") ||
+	    !strcmp(arch, "s390") || !strcmp(arch, "s390x") || !strcmp(arch, "powerpc"))
+		return true;
+
+	return false;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 7151a9138e3f..c355df2dba7b 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -175,6 +175,7 @@ void free_cpu_domain_info(struct cpu_domain_map **cd_map, u32 schedstat_version,
 void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
+bool perf_arch_is_big_endian(const char *arch);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 22be77225bb0..8611293deca9 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -449,7 +449,7 @@ void thread__find_cpumode_addr_location(struct thread *thread, u64 addr,
 	}
 }
 
-static uint16_t read_proc_e_machine_for_pid(pid_t pid, uint32_t *e_flags)
+static uint16_t read_proc_e_machine_for_pid(pid_t pid, uint32_t *e_flags, bool *is_big_endian)
 {
 	char path[6 /* "/proc/" */ + 11 /* max length of pid */ + 5 /* "/exe\0" */];
 	int fd;
@@ -458,7 +458,8 @@ static uint16_t read_proc_e_machine_for_pid(pid_t pid, uint32_t *e_flags)
 	snprintf(path, sizeof(path), "/proc/%d/exe", pid);
 	fd = open(path, O_RDONLY);
 	if (fd >= 0) {
-		e_machine = dso__read_e_machine(/*optional_dso=*/NULL, fd, e_flags);
+		e_machine = dso__read_e_machine_endian(/*optional_dso=*/NULL, fd, e_flags,
+						       is_big_endian);
 		close(fd);
 	}
 	return e_machine;
@@ -468,6 +469,7 @@ struct thread__e_machine_callback_args {
 	struct machine *machine;
 	uint32_t e_flags;
 	uint16_t e_machine;
+	bool is_big_endian;
 };
 
 static int thread__e_machine_callback(struct map *map, void *_args)
@@ -478,11 +480,13 @@ static int thread__e_machine_callback(struct map *map, void *_args)
 	if (!dso)
 		return 0; // No dso, continue search.
 
-	args->e_machine = dso__e_machine(dso, args->machine, &args->e_flags);
+	args->e_machine =
+		dso__e_machine_endian(dso, args->machine, &args->e_flags, &args->is_big_endian);
 	return args->e_machine != EM_NONE ? 1 /* stop search */ : 0 /* continue search */;
 }
 
-uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint32_t *e_flags)
+uint16_t thread__e_machine_endian(struct thread *thread, struct machine *machine, uint32_t *e_flags,
+				  bool *is_big_endian)
 {
 	pid_t tid, pid;
 	uint16_t e_machine = RC_CHK_ACCESS(thread)->e_machine;
@@ -491,6 +495,7 @@ uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint3
 		.machine = machine,
 		.e_flags = 0,
 		.e_machine = EM_NONE,
+		.is_big_endian = false,
 	};
 
 	if (e_machine != EM_NONE) {
@@ -510,7 +515,8 @@ uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint3
 		struct thread *parent = machine__findnew_thread(machine, pid, pid);
 
 		if (parent) {
-			e_machine = thread__e_machine(parent, machine, &local_e_flags);
+			e_machine = thread__e_machine_endian(parent, machine, &local_e_flags,
+							     is_big_endian);
 			thread__put(parent);
 			goto out;
 		}
@@ -522,6 +528,8 @@ uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint3
 	if (args.e_machine != EM_NONE) {
 		e_machine = args.e_machine;
 		local_e_flags = args.e_flags;
+		if (is_big_endian)
+			*is_big_endian = args.is_big_endian;
 	} else {
 		/* Maps failed, perhaps we're live with map events disabled. */
 		bool is_live = machine->machines == NULL;
@@ -536,7 +544,7 @@ uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint3
 		}
 		/* Read from /proc/pid/exe if live. */
 		if (is_live)
-			e_machine = read_proc_e_machine_for_pid(pid, &local_e_flags);
+			e_machine = read_proc_e_machine_for_pid(pid, &local_e_flags, is_big_endian);
 	}
 out:
 	if (e_machine != EM_NONE) {
@@ -545,6 +553,8 @@ uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint3
 	} else {
 		e_machine = EM_HOST;
 		local_e_flags = EF_HOST;
+		if (is_big_endian)
+			*is_big_endian = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
 	}
 	if (e_flags)
 		*e_flags = local_e_flags;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index f5792d3e8a16..8b58590c89de 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -311,7 +311,13 @@ static inline void thread__set_filter_entry_depth(struct thread *thread, int dep
 	RC_CHK_ACCESS(thread)->filter_entry_depth = depth;
 }
 
-uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint32_t *e_flags);
+uint16_t thread__e_machine_endian(struct thread *thread, struct machine *machine, uint32_t *e_flags,
+				  bool *is_big_endian);
+static inline uint16_t thread__e_machine(struct thread *thread, struct machine *machine,
+					 uint32_t *e_flags)
+{
+	return thread__e_machine_endian(thread, machine, e_flags, NULL);
+}
 
 static inline void thread__set_e_machine(struct thread *thread, uint16_t e_machine)
 {
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 04/18] perf capstone: Determine architecture from e_machine
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (2 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 03/18] perf env, dso, thread: Add _endian variants for e_machine helpers Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:50                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 05/18] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
                                               ` (13 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Avoid the use of arch string that is imprecise and use the
e_machine. Do more e_machine to capstone machine translations adding
MIPS and RISCV. Remove unnecessary maybe_unused annotations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/capstone.c | 129 ++++++++++++++++++++++++-------------
 1 file changed, 85 insertions(+), 44 deletions(-)

diff --git a/tools/perf/util/capstone.c b/tools/perf/util/capstone.c
index 25cf6e15ec27..870394b46911 100644
--- a/tools/perf/util/capstone.c
+++ b/tools/perf/util/capstone.c
@@ -1,7 +1,19 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "capstone.h"
-#include "annotate.h"
+
+#include <errno.h>
+#include <inttypes.h>
+#include <string.h>
+
+#include <dlfcn.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <linux/ctype.h>
+
+#include <capstone/capstone.h>
+
 #include "addr_location.h"
+#include "annotate.h"
 #include "debug.h"
 #include "disasm.h"
 #include "dso.h"
@@ -11,13 +23,6 @@
 #include "print_insn.h"
 #include "symbol.h"
 #include "thread.h"
-#include <dlfcn.h>
-#include <errno.h>
-#include <fcntl.h>
-#include <inttypes.h>
-#include <string.h>
-
-#include <capstone/capstone.h>
 
 #ifdef LIBCAPSTONE_DLOPEN
 static void *perf_cs_dll_handle(void)
@@ -137,37 +142,67 @@ static enum cs_err perf_cs_close(csh *handle)
 #endif
 }
 
-static int capstone_init(struct machine *machine, csh *cs_handle, bool is64,
+static bool e_machine_to_capstone(uint16_t e_machine, bool is64, bool is_big_endian,
+				  enum cs_arch *arch, enum cs_mode *mode)
+{
+	*mode = is_big_endian ? CS_MODE_BIG_ENDIAN : CS_MODE_LITTLE_ENDIAN;
+	*mode |= is64 ? CS_MODE_64 : CS_MODE_32;
+
+	switch (e_machine) {
+	case EM_X86_64:
+	case EM_386:
+		*arch = CS_ARCH_X86;
+		return true;
+	case EM_AARCH64:
+		*arch = CS_ARCH_ARM64;
+		*mode |= CS_MODE_ARM;
+		return true;
+	case EM_ARM:
+		*arch = CS_ARCH_ARM;
+		*mode |= CS_MODE_ARM | CS_MODE_V8;
+		return true;
+	case EM_S390:
+		*arch = CS_ARCH_SYSZ;
+		return true;
+	case EM_MIPS:
+		*arch = CS_ARCH_MIPS;
+		*mode |= is64 ? CS_MODE_MIPS64 : CS_MODE_MIPS32;
+		return true;
+	case EM_PPC:
+	case EM_PPC64:
+		*arch = CS_ARCH_PPC;
+		return true;
+	case EM_SPARC:
+		*arch = CS_ARCH_SPARC;
+		return true;
+	case EM_SPARCV9:
+		*arch = CS_ARCH_SPARC;
+		*mode |= CS_MODE_V9;
+		return true;
+	case EM_RISCV:
+		*arch = CS_ARCH_RISCV;
+		*mode |= is64 ? CS_MODE_RISCV64 : CS_MODE_RISCV32;
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int capstone_init(uint16_t e_machine, csh *cs_handle, bool is64, bool is_big_endian,
 			 bool disassembler_style)
 {
 	enum cs_arch arch;
 	enum cs_mode mode;
 
-	if (machine__is(machine, "x86_64") && is64) {
-		arch = CS_ARCH_X86;
-		mode = CS_MODE_64;
-	} else if (machine__normalized_is(machine, "x86")) {
-		arch = CS_ARCH_X86;
-		mode = CS_MODE_32;
-	} else if (machine__normalized_is(machine, "arm64")) {
-		arch = CS_ARCH_ARM64;
-		mode = CS_MODE_ARM;
-	} else if (machine__normalized_is(machine, "arm")) {
-		arch = CS_ARCH_ARM;
-		mode = CS_MODE_ARM + CS_MODE_V8;
-	} else if (machine__normalized_is(machine, "s390")) {
-		arch = CS_ARCH_SYSZ;
-		mode = CS_MODE_BIG_ENDIAN;
-	} else {
+	if (!e_machine_to_capstone(e_machine, is64, is_big_endian, &arch, &mode))
 		return -1;
-	}
 
 	if (perf_cs_open(arch, mode, cs_handle) != CS_ERR_OK) {
 		pr_warning_once("cs_open failed\n");
 		return -1;
 	}
 
-	if (machine__normalized_is(machine, "x86")) {
+	if (arch == CS_ARCH_X86) {
 		/*
 		 * In case of using capstone_init while symbol__disassemble
 		 * setting CS_OPT_SYNTAX_ATT depends if disassembler_style opts
@@ -211,29 +246,28 @@ static size_t print_insn_x86(struct thread *thread, u8 cpumode, struct cs_insn *
 	return printed;
 }
 
-
-ssize_t capstone__fprintf_insn_asm(struct machine *machine __maybe_unused,
-				   struct thread *thread __maybe_unused,
-				   u8 cpumode __maybe_unused, bool is64bit __maybe_unused,
-				   const uint8_t *code __maybe_unused,
-				   size_t code_size __maybe_unused,
-				   uint64_t ip __maybe_unused, int *lenp __maybe_unused,
-				   int print_opts __maybe_unused, FILE *fp __maybe_unused)
+ssize_t capstone__fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpumode,
+				   bool is64bit, const uint8_t *code, size_t code_size, uint64_t ip,
+				   int *lenp, int print_opts, FILE *fp)
 {
 	size_t printed;
 	struct cs_insn *insn;
 	csh cs_handle;
 	size_t count;
+	bool is_big_endian = false;
+	uint16_t e_machine = thread__e_machine_endian(thread, machine,
+						      /*e_flags=*/NULL, &is_big_endian);
 	int ret;
 
 	/* TODO: Try to initiate capstone only once but need a proper place. */
-	ret = capstone_init(machine, &cs_handle, is64bit, true);
+	ret = capstone_init(e_machine, &cs_handle, is64bit, is_big_endian,
+			    /*disassembler_style=*/true);
 	if (ret < 0)
 		return ret;
 
 	count = perf_cs_disasm(cs_handle, code, code_size, ip, 1, &insn);
 	if (count > 0) {
-		if (machine__normalized_is(machine, "x86"))
+		if (e_machine == EM_X86_64 || e_machine == EM_386)
 			printed = print_insn_x86(thread, cpumode, &insn[0], print_opts, fp);
 		else
 			printed = fprintf(fp, "%s %s", insn[0].mnemonic, insn[0].op_str);
@@ -322,9 +356,8 @@ static int find_file_offset(u64 start, u64 len, u64 pgoff, void *arg)
 	return 0;
 }
 
-int symbol__disassemble_capstone(const char *filename __maybe_unused,
-				 struct symbol *sym __maybe_unused,
-				 struct annotate_args *args __maybe_unused)
+int symbol__disassemble_capstone(const char *filename, struct symbol *sym,
+				 struct annotate_args *args)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct map *map = args->ms->map;
@@ -344,6 +377,8 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
 	char disasm_buf[512];
 	struct disasm_line *dl;
 	bool disassembler_style = false;
+	uint16_t e_machine;
+	bool is_big_endian = false;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -373,8 +408,10 @@ int symbol__disassemble_capstone(const char *filename __maybe_unused,
 	    !strcmp(args->options->disassembler_style, "att"))
 		disassembler_style = true;
 
-	if (capstone_init(maps__machine(thread__maps(args->ms->thread)), &handle, is_64bit,
-			  disassembler_style) < 0)
+	e_machine = thread__e_machine_endian(args->ms->thread,
+					     /*machine=*/NULL,
+					     /*e_flags=*/NULL, &is_big_endian);
+	if (capstone_init(e_machine, &handle, is_64bit, is_big_endian, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
@@ -466,6 +503,8 @@ int symbol__disassemble_capstone_powerpc(const char *filename __maybe_unused,
 	struct disasm_line *dl;
 	u32 *line;
 	bool disassembler_style = false;
+	uint16_t e_machine;
+	bool is_big_endian = false;
 
 	if (args->options->objdump_path)
 		return -1;
@@ -484,8 +523,10 @@ int symbol__disassemble_capstone_powerpc(const char *filename __maybe_unused,
 	    !strcmp(args->options->disassembler_style, "att"))
 		disassembler_style = true;
 
-	if (capstone_init(maps__machine(thread__maps(args->ms->thread)), &handle, is_64bit,
-			  disassembler_style) < 0)
+	e_machine = thread__e_machine_endian(args->ms->thread,
+					     /*machine=*/NULL,
+					     /*e_flags=*/NULL, &is_big_endian);
+	if (capstone_init(e_machine, &handle, is_64bit, is_big_endian, disassembler_style) < 0)
 		goto err;
 
 	needs_cs_close = true;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 05/18] perf print_insn: Use e_machine for fallback IP length check
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (3 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 04/18] perf capstone: Determine architecture from e_machine Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 06/18] perf symbol: Avoid use of machine__is Ian Rogers
                                               ` (12 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Avoid string comparisons with perf_env arch, switch to using the more
precise ELF machine.

Sort header files and fix missing definitions.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/print_insn.c | 23 ++++++++++++++---------
 tools/perf/util/print_insn.h |  3 +++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/print_insn.c b/tools/perf/util/print_insn.c
index 02e6fbb8ca04..4068436f26ea 100644
--- a/tools/perf/util/print_insn.c
+++ b/tools/perf/util/print_insn.c
@@ -4,19 +4,24 @@
  *
  * Author(s): Changbin Du <changbin.du@huawei.com>
  */
+#include "print_insn.h"
+
 #include <inttypes.h>
-#include <string.h>
 #include <stdbool.h>
+#include <string.h>
+
+#include <dwarf-regs.h>
+
 #include "capstone.h"
 #include "debug.h"
+#include "dso.h"
+#include "dump-insn.h"
+#include "env.h"
+#include "machine.h"
+#include "map.h"
 #include "sample.h"
 #include "symbol.h"
-#include "machine.h"
 #include "thread.h"
-#include "print_insn.h"
-#include "dump-insn.h"
-#include "map.h"
-#include "dso.h"
 
 size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 {
@@ -33,13 +38,13 @@ size_t sample__fprintf_insn_raw(struct perf_sample *sample, FILE *fp)
 static bool is64bitip(struct machine *machine, struct addr_location *al)
 {
 	const struct dso *dso = al->map ? map__dso(al->map) : NULL;
+	uint16_t e_machine;
 
 	if (dso)
 		return dso__is_64_bit(dso);
 
-	return machine__is(machine, "x86_64") ||
-		machine__normalized_is(machine, "arm64") ||
-		machine__normalized_is(machine, "s390");
+	e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);
+	return e_machine == EM_X86_64 || e_machine == EM_AARCH64 || e_machine == EM_S390;
 }
 
 ssize_t fprintf_insn_asm(struct machine *machine, struct thread *thread, u8 cpumode,
diff --git a/tools/perf/util/print_insn.h b/tools/perf/util/print_insn.h
index 07d11af3fc1c..1f2c2f25f973 100644
--- a/tools/perf/util/print_insn.h
+++ b/tools/perf/util/print_insn.h
@@ -5,6 +5,9 @@
 #include <stddef.h>
 #include <stdio.h>
 
+#include <linux/types.h>
+
+struct addr_location;
 struct perf_sample;
 struct thread;
 struct machine;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 06/18] perf symbol: Avoid use of machine__is
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (4 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 05/18] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:51                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 07/18] perf machine: Use perf_env e_machine rather than arch Ian Rogers
                                               ` (11 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch to using the ELF machine from the dso or running machine rather
than the machine perf_env arch that may fall back on EM_HOST. This
also avoids potentially imprecise string comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/symbol.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fcaeeddbbb6b..a4b1f837a5a5 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -851,6 +851,23 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 	return count;
 }
 
+static uint16_t machine_or_dso_e_machine(struct machine *machine, struct dso *dso)
+{
+	uint16_t e_machine = EM_NONE;
+	/* DSO should be most accurate */
+	if (dso)
+		e_machine = dso__e_machine(dso, machine, /*e_flags=*/NULL);
+
+	if (e_machine != EM_NONE)
+		return e_machine;
+
+	/* Check the global environment next. */
+	if (machine && machine->env && machine->env->e_machine != EM_NONE)
+		return machine->env->e_machine;
+
+	return perf_env__e_machine(machine ? machine->env : NULL, /*e_flags=*/NULL);
+}
+
 /*
  * Split the symbols into maps, making sure there are no overlaps, i.e. the
  * kernel range is broken in several maps, named [kernel].N, as we don't have
@@ -866,14 +883,13 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 	struct rb_root_cached *root = dso__symbols(dso);
 	struct rb_node *next = rb_first_cached(root);
 	int kernel_range = 0;
-	bool x86_64;
+	uint16_t e_machine = EM_NONE;
 
 	if (!kmaps)
 		return -1;
 
 	machine = maps__machine(kmaps);
-
-	x86_64 = machine__is(machine, "x86_64");
+	e_machine = machine_or_dso_e_machine(machine, dso);
 
 	while (next) {
 		char *module;
@@ -925,7 +941,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			 */
 			pos->start = map__map_ip(curr_map, pos->start);
 			pos->end   = map__map_ip(curr_map, pos->end);
-		} else if (x86_64 && is_entry_trampoline(pos->name)) {
+		} else if (e_machine == EM_X86_64 && is_entry_trampoline(pos->name)) {
 			/*
 			 * These symbols are not needed anymore since the
 			 * trampoline maps refer to the text section and it's
@@ -1428,7 +1444,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 		free(new_node);
 	}
 
-	if (machine__is(machine, "x86_64")) {
+	if (machine_or_dso_e_machine(machine, dso) == EM_X86_64) {
 		u64 addr;
 
 		/*
@@ -1716,7 +1732,7 @@ int dso__load(struct dso *dso, struct map *map)
 			ret = dso__load_guest_kernel_sym(dso, map);
 
 		machine = maps__machine(map__kmaps(map));
-		if (machine__is(machine, "x86_64"))
+		if (machine_or_dso_e_machine(machine, dso) == EM_X86_64)
 			machine__map_x86_64_entry_trampolines(machine, dso);
 		goto out;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 07/18] perf machine: Use perf_env e_machine rather than arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (5 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 06/18] perf symbol: Avoid use of machine__is Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  1:00                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 08/18] perf sample-raw: " Ian Rogers
                                               ` (10 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

The arch string is derived from uname and may be normalized causing
potential differences meaning the ELF machine can be more
precise. Reduce the scope of machine__is as often it is better to use
a thread for the e_machine rather than the machine. Switch from string
to ELF machine constant comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 25 ++++++++-----------------
 tools/perf/util/machine.h |  2 --
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e76f8c86e62a..6d32d3cb5cb7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1611,10 +1611,15 @@ static bool machine__uses_kcore(struct machine *machine)
 	return dsos__for_each_dso(&machine->dsos, machine__uses_kcore_cb, NULL) != 0 ? true : false;
 }
 
+static bool machine__is(struct machine *machine, uint16_t e_machine)
+{
+	return machine && perf_env__e_machine(machine->env, NULL) == e_machine;
+}
+
 static bool perf_event__is_extra_kernel_mmap(struct machine *machine,
 					     struct extra_kernel_map *xm)
 {
-	return machine__is(machine, "x86_64") &&
+	return machine__is(machine, EM_X86_64) &&
 	       is_entry_trampoline(xm->name);
 }
 
@@ -2770,7 +2775,7 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 static u64 get_leaf_frame_caller(struct perf_sample *sample,
 		struct thread *thread, int usr_idx)
 {
-	if (machine__normalized_is(maps__machine(thread__maps(thread)), "arm64"))
+	if (thread__e_machine(thread, /*machine=*/NULL, /*e_flags=*/NULL) == EM_AARCH64)
 		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
 	else
 		return 0;
@@ -3141,20 +3146,6 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 	return 0;
 }
 
-/*
- * Compares the raw arch string. N.B. see instead perf_env__arch() or
- * machine__normalized_is() if a normalized arch is needed.
- */
-bool machine__is(struct machine *machine, const char *arch)
-{
-	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
-}
-
-bool machine__normalized_is(struct machine *machine, const char *arch)
-{
-	return machine && !strcmp(perf_env__arch(machine->env), arch);
-}
-
 int machine__nr_cpus_avail(struct machine *machine)
 {
 	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
@@ -3181,7 +3172,7 @@ int machine__get_kernel_start(struct machine *machine)
 		 * start of kernel text, but still above 2^63. So leave
 		 * kernel_start = 1ULL << 63 for x86_64.
 		 */
-		if (!err && !machine__is(machine, "x86_64"))
+		if (!err && !machine__is(machine, EM_X86_64))
 			machine->kernel_start = map__start(map);
 	}
 	return err;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 22a42c5825fa..003c970b3e4b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -227,8 +227,6 @@ static inline bool machine__is_host(struct machine *machine)
 }
 
 bool machine__is_lock_function(struct machine *machine, u64 addr);
-bool machine__is(struct machine *machine, const char *arch);
-bool machine__normalized_is(struct machine *machine, const char *arch);
 int machine__nr_cpus_avail(struct machine *machine);
 
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 08/18] perf sample-raw: Use perf_env e_machine rather than arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (6 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 07/18] perf machine: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 09/18] perf sort: " Ian Rogers
                                               ` (9 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than the arch to determine S390 and x86 types.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/sample-raw.c | 21 +++++++++++----------
 tools/perf/util/sample-raw.h |  6 +++++-
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/sample-raw.c b/tools/perf/util/sample-raw.c
index bcf442574d6e..be998c713a0d 100644
--- a/tools/perf/util/sample-raw.c
+++ b/tools/perf/util/sample-raw.c
@@ -1,11 +1,12 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#include "sample-raw.h"
 
-#include <string.h>
+#include <elf.h>
 #include <linux/string.h>
-#include "evlist.h"
+
 #include "env.h"
+#include "evlist.h"
 #include "header.h"
-#include "sample-raw.h"
 #include "session.h"
 
 /*
@@ -14,14 +15,14 @@
  */
 void evlist__init_trace_event_sample_raw(struct evlist *evlist, struct perf_env *env)
 {
-	const char *arch_pf = perf_env__arch(env);
-	const char *cpuid = perf_env__cpuid(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (arch_pf && !strcmp("s390", arch_pf))
+	if (e_machine == EM_S390) {
 		evlist->trace_event_sample_raw = evlist__s390_sample_raw;
-	else if (arch_pf && !strcmp("x86", arch_pf) &&
-		 cpuid && strstarts(cpuid, "AuthenticAMD") &&
-		 evlist__has_amd_ibs(evlist)) {
-		evlist->trace_event_sample_raw = evlist__amd_sample_raw;
+	} else if (e_machine == EM_X86_64 || e_machine == EM_386) {
+		const char *cpuid = perf_env__cpuid(env);
+
+		if (cpuid && strstarts(cpuid, "AuthenticAMD") && evlist__has_amd_ibs(evlist))
+			evlist->trace_event_sample_raw = evlist__amd_sample_raw;
 	}
 }
diff --git a/tools/perf/util/sample-raw.h b/tools/perf/util/sample-raw.h
index 896e9a87e373..c8d38c841c8c 100644
--- a/tools/perf/util/sample-raw.h
+++ b/tools/perf/util/sample-raw.h
@@ -2,7 +2,10 @@
 #ifndef __SAMPLE_RAW_H
 #define __SAMPLE_RAW_H 1
 
+#include <stdbool.h>
+
 struct evlist;
+struct perf_env;
 union perf_event;
 struct perf_sample;
 
@@ -12,4 +15,5 @@ bool evlist__has_amd_ibs(struct evlist *evlist);
 void evlist__amd_sample_raw(struct evlist *evlist, union perf_event *event,
 			    struct perf_sample *sample);
 void evlist__init_trace_event_sample_raw(struct evlist *evlist, struct perf_env *env);
-#endif /* __PERF_EVLIST_H */
+
+#endif /* __SAMPLE_RAW_H */
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 09/18] perf sort: Use perf_env e_machine rather than arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (7 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 08/18] perf sample-raw: " Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 10/18] perf arch common: " Ian Rogers
                                               ` (8 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than the arch to determine x86 or PPC types.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/sort.c | 58 +++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 0020089cb13c..90bc4a31bb55 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1,40 +1,45 @@
 // SPDX-License-Identifier: GPL-2.0
+#include "sort.h"
+
 #include <ctype.h>
 #include <errno.h>
 #include <inttypes.h>
-#include <regex.h>
 #include <stdlib.h>
+
+#include <elf.h>
+#include <linux/kernel.h>
 #include <linux/mman.h>
+#include <linux/string.h>
 #include <linux/time64.h>
+
+#include <regex.h>
+
+#include "annotate-data.h"
+#include "annotate.h"
+#include "branch.h"
+#include "cacheline.h"
+#include "cgroup.h"
+#include "comm.h"
 #include "debug.h"
 #include "dso.h"
-#include "sort.h"
+#include "event.h"
+#include "evlist.h"
+#include "evsel.h"
 #include "hist.h"
-#include "cacheline.h"
-#include "comm.h"
+#include "machine.h"
 #include "map.h"
-#include "maps.h"
-#include "symbol.h"
 #include "map_symbol.h"
-#include "branch.h"
-#include "thread.h"
-#include "evsel.h"
-#include "evlist.h"
-#include "srcline.h"
-#include "strlist.h"
-#include "strbuf.h"
+#include "maps.h"
 #include "mem-events.h"
 #include "mem-info.h"
-#include "annotate.h"
-#include "annotate-data.h"
-#include "event.h"
-#include "time-utils.h"
-#include "cgroup.h"
-#include "machine.h"
 #include "session.h"
+#include "srcline.h"
+#include "strbuf.h"
+#include "strlist.h"
+#include "symbol.h"
+#include "thread.h"
+#include "time-utils.h"
 #include "trace-event.h"
-#include <linux/kernel.h>
-#include <linux/string.h>
 
 #ifdef HAVE_LIBTRACEEVENT
 #include <event-parse.h>
@@ -2673,9 +2678,10 @@ struct sort_dimension {
 
 static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 {
-	const char *arch = perf_env__arch(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
+	if (e_machine == EM_X86_64 || e_machine == EM_386 || e_machine == EM_PPC64 ||
+	    e_machine == EM_PPC) {
 		if (!strcmp(sort_key, "p_stage_cyc"))
 			return 1;
 		if (!strcmp(sort_key, "local_p_stage_cyc"))
@@ -2686,14 +2692,14 @@ static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 
 static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
 {
-	const char *arch = perf_env__arch(env);
+	uint16_t e_machine = perf_env__e_machine(env, /*e_eflags=*/NULL);
 
-	if (!strcmp("x86", arch)) {
+	if (e_machine == EM_X86_64 || e_machine == EM_386) {
 		if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
 			return "Local Retire Latency";
 		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
 			return "Retire Latency";
-	} else if (!strcmp("powerpc", arch)) {
+	} else if (e_machine == EM_PPC64 || e_machine == EM_PPC) {
 		if (!strcmp(se_header, "Local INSTR Latency"))
 			return "Finish Cyc";
 		else if (!strcmp(se_header, "INSTR Latency"))
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 10/18] perf arch common: Use perf_env e_machine rather than arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (8 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 09/18] perf sort: " Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:38                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 11/18] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
                                               ` (7 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/common.c | 62 ++++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index 21836f70f231..1d8aff9b32d6 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -1,13 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
+#include "common.h"
+
 #include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+
+#include <linux/zalloc.h>
 #include <unistd.h>
-#include "common.h"
-#include "../util/env.h"
+
+#include <dwarf-regs.h>
+
 #include "../util/debug.h"
-#include <linux/zalloc.h>
+#include "../util/env.h"
 
 static const char *const arc_triplets[] = {
 	"arc-linux-",
@@ -145,7 +150,8 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 					  const char *name, char **path)
 {
 	int idx;
-	const char *arch = perf_env__arch(env), *cross_env;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+	const char *cross_env;
 	const char *const *path_list;
 	char *buf = NULL;
 
@@ -153,7 +159,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 	 * We don't need to try to find objdump path for native system.
 	 * Just use default binutils path (e.g.: "objdump").
 	 */
-	if (!strcmp(perf_env__arch(NULL), arch))
+	if (e_machine == EM_HOST)
 		goto out;
 
 	cross_env = getenv("CROSS_COMPILE");
@@ -170,30 +176,42 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 		zfree(&buf);
 	}
 
-	if (!strcmp(arch, "arc"))
+	switch (e_machine) {
+	case EM_ARC:
 		path_list = arc_triplets;
-	else if (!strcmp(arch, "arm"))
+		break;
+	case EM_ARM:
 		path_list = arm_triplets;
-	else if (!strcmp(arch, "arm64"))
+		break;
+	case EM_AARCH64:
 		path_list = arm64_triplets;
-	else if (!strcmp(arch, "powerpc"))
+		break;
+	case EM_PPC:
+	case EM_PPC64:
 		path_list = powerpc_triplets;
-	else if (!strcmp(arch, "riscv32"))
-		path_list = riscv32_triplets;
-	else if (!strcmp(arch, "riscv64"))
-		path_list = riscv64_triplets;
-	else if (!strcmp(arch, "sh"))
+		break;
+	case EM_RISCV:
+		path_list = perf_env__kernel_is_64_bit(env) ? riscv64_triplets : riscv32_triplets;
+		break;
+	case EM_SH:
 		path_list = sh_triplets;
-	else if (!strcmp(arch, "s390"))
+		break;
+	case EM_S390:
 		path_list = s390_triplets;
-	else if (!strcmp(arch, "sparc"))
+		break;
+	case EM_SPARC:
+	case EM_SPARCV9:
 		path_list = sparc_triplets;
-	else if (!strcmp(arch, "x86"))
+		break;
+	case EM_X86_64:
+	case EM_386:
 		path_list = x86_triplets;
-	else if (!strcmp(arch, "mips"))
+		break;
+	case EM_MIPS:
 		path_list = mips_triplets;
-	else {
-		ui__error("binutils for %s not supported.\n", arch);
+		break;
+	default:
+		ui__error("binutils for %s not supported.\n", perf_env__arch(env));
 		goto out_error;
 	}
 
@@ -202,7 +220,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
 		ui__error("Please install %s for %s.\n"
 			  "You can add it to PATH, set CROSS_COMPILE or "
 			  "override the default using --%s.\n",
-			  name, arch, name);
+			  name, perf_env__arch(env), name);
 		goto out_error;
 	}
 
@@ -237,5 +255,5 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
  */
 bool perf_env__single_address_space(struct perf_env *env)
 {
-	return strcmp(perf_env__arch(env), "sparc");
+	return perf_env__e_machine(env, /*e_flags=*/NULL) != EM_SPARCV9;
 }
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 11/18] perf header: In print_pmu_caps use perf_env e_machine
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (9 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 10/18] perf arch common: " Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 12/18] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
                                               ` (6 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch from arch to e_machine in print_pmu_caps.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/header.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index f1ae61392cce..bdf6c5d0fd5d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2348,15 +2348,16 @@ static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
 static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
 {
 	struct perf_env *env = &ff->ph->env;
-	struct pmu_caps *pmu_caps;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
 
 	for (int i = 0; i < env->nr_pmus_with_caps; i++) {
-		pmu_caps = &env->pmu_caps[i];
+		struct pmu_caps *pmu_caps = &env->pmu_caps[i];
+
 		__print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps,
 				 pmu_caps->pmu_name);
 	}
 
-	if (strcmp(perf_env__arch(env), "x86") == 0 &&
+	if ((e_machine == EM_X86_64 || e_machine == EM_386) &&
 	    perf_env__has_pmu_mapping(env, "ibs_op")) {
 		char *max_precise = perf_env__find_pmu_cap(env, "cpu", "max_precise");
 
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 12/18] perf c2c: Use perf_env e_machine rather than arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (10 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 11/18] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 13/18] perf lock-contention: " Ian Rogers
                                               ` (5 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching for AARCH64.

Add include of dwarf-regs.h in case the EM_AARCH64 isn't defined, sort
the headers given this include.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-c2c.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 72a7802775ee..c55cab53531b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -12,41 +12,45 @@
  */
 #include <errno.h>
 #include <inttypes.h>
+
+#include <asm/bug.h>
 #include <linux/compiler.h>
 #include <linux/err.h>
 #include <linux/kernel.h>
 #include <linux/stringify.h>
 #include <linux/zalloc.h>
-#include <asm/bug.h>
 #include <sys/param.h>
-#include "debug.h"
-#include "builtin.h"
+
+#include <dwarf-regs.h>
 #include <perf/cpumap.h>
 #include <subcmd/pager.h>
 #include <subcmd/parse-options.h>
-#include "map_symbol.h"
-#include "mem-events.h"
-#include "session.h"
-#include "hist.h"
-#include "sort.h"
-#include "tool.h"
+
+#include "builtin.h"
 #include "cacheline.h"
 #include "data.h"
+#include "debug.h"
 #include "event.h"
 #include "evlist.h"
 #include "evsel.h"
-#include "ui/browsers/hists.h"
-#include "thread.h"
-#include "mem2node.h"
+#include "hist.h"
+#include "map_symbol.h"
+#include "mem-events.h"
 #include "mem-info.h"
-#include "symbol.h"
-#include "ui/ui.h"
-#include "ui/progress.h"
+#include "mem2node.h"
 #include "pmus.h"
+#include "session.h"
+#include "sort.h"
 #include "string2.h"
-#include "util/util.h"
-#include "util/symbol.h"
+#include "symbol.h"
+#include "thread.h"
+#include "tool.h"
+#include "ui/browsers/hists.h"
+#include "ui/progress.h"
+#include "ui/ui.h"
 #include "util/annotate.h"
+#include "util/symbol.h"
+#include "util/util.h"
 
 struct c2c_hists {
 	struct hists		hists;
@@ -3202,7 +3206,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	 * default display type.
 	 */
 	if (!display) {
-		if (!strcmp(perf_env__arch(env), "arm64"))
+		if (perf_env__e_machine(env, /*e_flags=*/NULL) == EM_AARCH64)
 			display = "peer";
 		else
 			display = "tot";
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 13/18] perf lock-contention: Use perf_env e_machine rather than arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (11 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 12/18] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 14/18] perf env: Refactor perf_env__arch_strerrno Ian Rogers
                                               ` (4 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Use the e_machine rather than arch string matching for powerpc.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/lock-contention.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/lock-contention.c b/tools/perf/util/lock-contention.c
index 92e7b7b572a2..119a7206f3cd 100644
--- a/tools/perf/util/lock-contention.c
+++ b/tools/perf/util/lock-contention.c
@@ -104,7 +104,8 @@ bool match_callstack_filter(struct machine *machine, u64 *callstack, int max_sta
 	struct map *kmap;
 	struct symbol *sym;
 	u64 ip;
-	const char *arch = perf_env__arch(machine->env);
+	uint16_t e_machine = perf_env__e_machine(machine->env, /*e_flags=*/NULL);
+	bool is_powerpc = e_machine == EM_PPC64 || e_machine == EM_PPC;
 
 	if (list_empty(&callstack_filters))
 		return true;
@@ -125,8 +126,7 @@ bool match_callstack_filter(struct machine *machine, u64 *callstack, int max_sta
 		 * incase first or second callstack index entry has 0
 		 * address for powerpc.
 		 */
-		if (!callstack || (!callstack[i] && (strcmp(arch, "powerpc") ||
-						(i != 1 && i != 2))))
+		if (!callstack || (!callstack[i] && (!is_powerpc || (i != 1 && i != 2))))
 			break;
 
 		ip = callstack[i];
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 14/18] perf env: Refactor perf_env__arch_strerrno
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (12 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 13/18] perf lock-contention: " Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  1:11                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 15/18] perf env: Remove unused perf_env__raw_arch Ian Rogers
                                               ` (3 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

perf_env__arch_strerrno is only available with libtraceevent so hide
the declaration if no libtraceevent.

The previous approach maps an architecture string to a function
pointer to a function that takes an int errno values and returns a
string. The new approach takes an e_machine and an errno value and
returns a string.

As the only call site is in builtin-trace.c, the e_machine is already
present and potentially more specific than the perf_env arch string
that is a single global value.

The major complication in this approach is having the shell script
that generates the C code map a linux directory name to the matching
ELF machine constants.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-trace.c                  |  7 ++--
 tools/perf/trace/beauty/arch_errno_names.sh | 40 ++++++++++++++++++---
 tools/perf/util/env.c                       | 13 +++----
 tools/perf/util/env.h                       |  7 ++--
 4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e58c49d047a2..d278af18542f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3008,9 +3008,8 @@ static int trace__sys_exit(struct trace *trace, struct evsel *evsel,
 	} else if (ret < 0) {
 errno_print: {
 		char bf[STRERR_BUFSIZE];
-		struct perf_env *env = evsel__env(evsel) ?: &trace->host_env;
 		const char *emsg = str_error_r(-ret, bf, sizeof(bf));
-		const char *e = perf_env__arch_strerrno(env, err);
+		const char *e = perf_env__arch_strerrno(e_machine, err);
 
 		fprintf(trace->output, "-1 %s (%s)", e, emsg);
 	}
@@ -4890,7 +4889,9 @@ static size_t syscall__dump_stats(struct trace *trace, int e_machine, FILE *fp,
 
 				for (e = 0; e < stats->max_errno; ++e) {
 					if (stats->errnos[e] != 0)
-						fprintf(fp, "\t\t\t\t%s: %d\n", perf_env__arch_strerrno(trace->host->env, e + 1), stats->errnos[e]);
+						fprintf(fp, "\t\t\t\t%s: %d\n",
+							perf_env__arch_strerrno(e_machine, e + 1),
+							stats->errnos[e]);
 				}
 			}
 			lines++;
diff --git a/tools/perf/trace/beauty/arch_errno_names.sh b/tools/perf/trace/beauty/arch_errno_names.sh
index b22890b8d272..89b742927168 100755
--- a/tools/perf/trace/beauty/arch_errno_names.sh
+++ b/tools/perf/trace/beauty/arch_errno_names.sh
@@ -52,21 +52,49 @@ process_arch()
 		|IFS=, create_errno_lookup_func "$arch"
 }
 
+arch_to_e_machine()
+{
+	case "$1" in
+	alpha)      printf '\tcase EM_ALPHA:\n' ;;
+	arc)        printf '\tcase EM_ARC:\n' ;;
+	arm)        printf '\tcase EM_ARM:\n' ;;
+	arm64)      printf '\tcase EM_AARCH64:\n' ;;
+	csky)       printf '\tcase EM_CSKY:\n' ;;
+	hexagon)    printf '\tcase EM_HEXAGON:\n' ;;
+	loongarch)  printf '\tcase EM_LOONGARCH:\n' ;;
+	microblaze) printf '\tcase EM_MICROBLAZE:\n' ;;
+	mips)       printf '\tcase EM_MIPS:\n' ;;
+	parisc)     printf '\tcase EM_PARISC:\n' ;;
+	powerpc)    printf '\tcase EM_PPC:\n\tcase EM_PPC64:\n' ;;
+	riscv)      printf '\tcase EM_RISCV:\n' ;;
+	s390)       printf '\tcase EM_S390:\n' ;;
+	sh)         printf '\tcase EM_SH:\n' ;;
+	sparc)      printf '\tcase EM_SPARC:\n\tcase EM_SPARCV9:\n' ;;
+	x86)        printf '\tcase EM_386:\n\tcase EM_X86_64:\n' ;;
+	xtensa)     printf '\tcase EM_XTENSA:\n' ;;
+	esac
+}
+
 create_arch_errno_table_func()
 {
 	archlist="$1"
 	default="$2"
 
-	printf 'static arch_syscalls__strerrno_t *\n'
-	printf 'arch_syscalls__strerrno_function(const char *arch)\n'
+	printf 'static const char *\n'
+	printf 'arch_syscalls__strerrno(uint16_t e_machine, int err)\n'
 	printf '{\n'
+	printf '\tswitch (e_machine) {\n'
 	for arch in $archlist; do
 		arch_str=$(arch_string "$arch")
-		printf '\tif (!strcmp(arch, "%s"))\n' "$arch_str"
-		printf '\t\treturn errno_to_name__%s;\n' "$arch_str"
+		ems=$(arch_to_e_machine "$arch_str")
+		if [ -n "$ems" ]; then
+			printf '%s\n' "$ems"
+			printf '\t\treturn errno_to_name__%s(err);\n' "$arch_str"
+		fi
 	done
 	arch_str=$(arch_string "$default")
-	printf '\treturn errno_to_name__%s;\n' "$arch_str"
+	printf '\tdefault:\n\t\treturn errno_to_name__%s(err);\n' "$arch_str"
+	printf '\t}\n'
 	printf '}\n'
 }
 
@@ -74,6 +102,8 @@ cat <<EoHEADER
 /* SPDX-License-Identifier: GPL-2.0 */
 
 #include <string.h>
+#include <stdint.h>
+#include <elf.h>
 
 EoHEADER
 
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1a4db133262b..8ac7aff0b27c 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -798,17 +798,12 @@ const char *perf_env__arch(struct perf_env *env)
 #include "trace/beauty/arch_errno_names.c"
 #endif
 
-const char *perf_env__arch_strerrno(struct perf_env *env __maybe_unused, int err __maybe_unused)
-{
 #if defined(HAVE_LIBTRACEEVENT)
-	if (env->arch_strerrno == NULL)
-		env->arch_strerrno = arch_syscalls__strerrno_function(perf_env__arch(env));
-
-	return env->arch_strerrno ? env->arch_strerrno(err) : "no arch specific strerrno function";
-#else
-	return "!HAVE_LIBTRACEEVENT";
-#endif
+const char *perf_env__arch_strerrno(uint16_t e_machine, int err)
+{
+	return arch_syscalls__strerrno(e_machine, err);
 }
+#endif
 
 const char *perf_env__cpuid(struct perf_env *env)
 {
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c355df2dba7b..ba51b871c401 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -67,8 +67,6 @@ struct cpu_domain_map {
 	struct domain_info	**domains;
 };
 
-typedef const char *(arch_syscalls__strerrno_t)(int err);
-
 struct perf_env {
 	char			*hostname;
 	char			*os_release;
@@ -158,7 +156,6 @@ struct perf_env {
 		 */
 		bool	enabled;
 	} clock;
-	arch_syscalls__strerrno_t *arch_strerrno;
 };
 
 enum perf_compress_type {
@@ -191,7 +188,9 @@ void cpu_cache_level__free(struct cpu_cache_level *cache);
 uint16_t perf_env__e_machine_nocache(struct perf_env *env, uint32_t *e_flags);
 uint16_t perf_env__e_machine(struct perf_env *env, uint32_t *e_flags);
 const char *perf_env__arch(struct perf_env *env);
-const char *perf_env__arch_strerrno(struct perf_env *env, int err);
+#if defined(HAVE_LIBTRACEEVENT)
+const char *perf_env__arch_strerrno(uint16_t e_machine, int err);
+#endif
 const char *perf_env__cpuid(struct perf_env *env);
 const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 15/18] perf env: Remove unused perf_env__raw_arch
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (13 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 14/18] perf env: Refactor perf_env__arch_strerrno Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 16/18] perf env: Add helper to lazily compute the os_release Ian Rogers
                                               ` (2 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

The switch to using e_machine has made the perf_env__raw_arch function
unused so remove it.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/env.c | 18 ------------------
 tools/perf/util/env.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 8ac7aff0b27c..29d5fe37528b 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -463,19 +463,6 @@ int perf_env__read_cpuid(struct perf_env *env)
 	return 0;
 }
 
-static int perf_env__read_arch(struct perf_env *env)
-{
-	struct utsname uts;
-
-	if (env->arch)
-		return 0;
-
-	if (!uname(&uts))
-		env->arch = strdup(uts.machine);
-
-	return env->arch ? 0 : -ENOMEM;
-}
-
 static int perf_env__read_nr_cpus_avail(struct perf_env *env)
 {
 	if (env->nr_cpus_avail == 0)
@@ -594,11 +581,6 @@ int perf_env__read_core_pmu_caps(struct perf_env *env)
 	return ret;
 }
 
-const char *perf_env__raw_arch(struct perf_env *env)
-{
-	return env && !perf_env__read_arch(env) ? env->arch : "unknown";
-}
-
 int perf_env__nr_cpus_avail(struct perf_env *env)
 {
 	return env && !perf_env__read_nr_cpus_avail(env) ? env->nr_cpus_avail : 0;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index ba51b871c401..bc4801d8399b 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -192,7 +192,6 @@ const char *perf_env__arch(struct perf_env *env);
 const char *perf_env__arch_strerrno(uint16_t e_machine, int err);
 #endif
 const char *perf_env__cpuid(struct perf_env *env);
-const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
 
 void perf_env__init(struct perf_env *env);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 16/18] perf env: Add helper to lazily compute the os_release
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (14 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 15/18] perf env: Remove unused perf_env__raw_arch Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  1:00                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 17/18] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
  2026-05-03  0:22                             ` [PATCH v9 18/18] perf symbol: Lazily compute idle Ian Rogers
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

In live mode the os_release isn't being initialized, make a lazy
initialization helper that assumes when the os_release isn't
initialized this is live mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/data-convert-bt.c |  2 +-
 tools/perf/util/env.c             | 21 +++++++++++++++++++++
 tools/perf/util/env.h             |  1 +
 tools/perf/util/header.c          | 16 +++++++++++-----
 tools/perf/util/symbol.c          |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 3b8f2df823a9..2c88420fe33e 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1414,7 +1414,7 @@ do {									\
 
 	ADD("host",    env->hostname);
 	ADD("sysname", "Linux");
-	ADD("release", env->os_release);
+	ADD("release", perf_env__os_release(env));
 	ADD("version", env->version);
 	ADD("machine", env->arch);
 	ADD("domain", "kernel");
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 29d5fe37528b..45dde40042b5 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -351,6 +351,27 @@ bool perf_arch_is_big_endian(const char *arch)
 	return false;
 }
 
+const char *perf_env__os_release(struct perf_env *env)
+{
+	struct utsname uts;
+	int ret;
+
+	if (!env)
+		return perf_version_string;
+
+	if (env->os_release)
+		return env->os_release;
+
+	/*
+	 * The os_release is being accessed but wasn't initialized from a data
+	 * file, assume this is 'live' mode and use the release from uname. If
+	 * uname or strdup fails then use the current perf tool version.
+	 */
+	ret = uname(&uts);
+	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);
+	return env->os_release ?: perf_version_string;
+}
+
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
 {
 	int i;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index bc4801d8399b..bbf10446204c 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -173,6 +173,7 @@ void perf_env__exit(struct perf_env *env);
 
 int perf_env__kernel_is_64_bit(struct perf_env *env);
 bool perf_arch_is_big_endian(const char *arch);
+const char *perf_env__os_release(struct perf_env *env);
 
 int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index bdf6c5d0fd5d..ce0c392ead69 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -370,13 +370,19 @@ static int write_osrelease(struct feat_fd *ff,
 			   struct evlist *evlist __maybe_unused)
 {
 	struct utsname uts;
-	int ret;
+	const char *release = NULL;
 
-	ret = uname(&uts);
-	if (ret < 0)
-		return -1;
+	if (evlist->session)
+		release = perf_env__os_release(perf_session__env(evlist->session));
 
-	return do_write_string(ff, uts.release);
+	if (!release) {
+		int ret = uname(&uts);
+
+		if (ret < 0)
+			return -1;
+		release = uts.release;
+	}
+	return do_write_string(ff, release);
 }
 
 static int write_arch(struct feat_fd *ff, struct evlist *evlist)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a4b1f837a5a5..fabed5b0fa57 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2225,7 +2225,7 @@ static int vmlinux_path__init(struct perf_env *env)
 {
 	struct utsname uts;
 	char bf[PATH_MAX];
-	char *kernel_version;
+	const char *kernel_version;
 	unsigned int i;
 
 	vmlinux_path = malloc(sizeof(char *) * (ARRAY_SIZE(vmlinux_paths) +
@@ -2242,7 +2242,7 @@ static int vmlinux_path__init(struct perf_env *env)
 		return 0;
 
 	if (env) {
-		kernel_version = env->os_release;
+		kernel_version = perf_env__os_release(env);
 	} else {
 		if (uname(&uts) < 0)
 			goto out_fail;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 17/18] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (15 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 16/18] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  0:59                               ` sashiko-bot
  2026-05-03  0:22                             ` [PATCH v9 18/18] perf symbol: Lazily compute idle Ian Rogers
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

A problem with putting bitfields into struct symbol is that other bits in
the symbol could be updated concurrently and only one update to the
underlying storage unit happen, leading to lost updates.

To avoid this, use atomics to atomically read or set part of 16-bits
of flags in the symbol. Add accessors to simplify this.

The idle value has 3 values in preparation for a later change that
will lazily update it.

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-inject.c                   |  6 +-
 tools/perf/builtin-kwork.c                    |  2 +-
 tools/perf/builtin-report.c                   |  2 +-
 tools/perf/builtin-sched.c                    |  4 +-
 tools/perf/builtin-top.c                      |  6 +-
 tools/perf/tests/symbols.c                    |  2 +-
 tools/perf/tests/vmlinux-kallsyms.c           |  2 +-
 tools/perf/ui/browsers/annotate.c             |  2 +-
 tools/perf/ui/browsers/map.c                  |  4 +-
 tools/perf/util/annotate.c                    |  5 +-
 tools/perf/util/auxtrace.c                    |  6 +-
 tools/perf/util/callchain.c                   |  4 +-
 tools/perf/util/dlfilter.c                    |  2 +-
 tools/perf/util/evsel_fprintf.c               |  6 +-
 tools/perf/util/intel-pt.c                    |  2 +-
 tools/perf/util/machine.c                     |  2 +-
 tools/perf/util/probe-event.c                 |  4 +-
 .../util/scripting-engines/trace-event-perl.c |  2 +-
 .../scripting-engines/trace-event-python.c    |  4 +-
 tools/perf/util/sort.c                        |  8 +-
 tools/perf/util/srcline.c                     | 10 +--
 tools/perf/util/symbol-elf.c                  |  3 +-
 tools/perf/util/symbol.c                      | 84 +++++++++++++++----
 tools/perf/util/symbol.h                      | 70 ++++++++++++----
 tools/perf/util/symbol_fprintf.c              |  4 +-
 25 files changed, 171 insertions(+), 75 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f174bc69cec4..390327c7f78d 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -439,9 +439,9 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
 	node = cursor->first;
 	for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
 		if (machine__kernel_ip(machine, node->ip))
-			/* kernel IPs were added already */;
-		else if (node->ms.sym && node->ms.sym->inlined)
-			/* we can't handle inlined callchains */;
+			; /* kernel IPs were added already */
+		else if (node->ms.sym && symbol__inlined(node->ms.sym))
+			; /* we can't handle inlined callchains */
 		else
 			inject->raw_callchain->ips[i++] = node->ip;
 
diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
index 9d3a4c779a41..7337ee956dc9 100644
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@@ -725,7 +725,7 @@ static void timehist_save_callchain(struct perf_kwork *kwork,
 		if (sym) {
 			if (!strcmp(sym->name, "__softirqentry_text_start") ||
 			    !strcmp(sym->name, "__do_softirq"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 95c0bdba6b11..3c9ada8539c3 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -753,7 +753,7 @@ static int hists__resort_cb(struct hist_entry *he, void *arg)
 	struct report *rep = arg;
 	struct symbol *sym = he->ms.sym;
 
-	if (rep->symbol_ipc && sym && !sym->annotate2) {
+	if (rep->symbol_ipc && sym && !symbol__is_annotate2(sym)) {
 		struct evsel *evsel = hists_to_evsel(he->hists);
 
 		symbol__annotate2(&he->ms, evsel, NULL);
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 555247568e7a..7c874a258cb4 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2371,7 +2371,7 @@ static void save_task_callchain(struct perf_sched *sched,
 			if (!strcmp(sym->name, "schedule") ||
 			    !strcmp(sym->name, "__schedule") ||
 			    !strcmp(sym->name, "preempt_schedule"))
-				sym->ignore = 1;
+				symbol__set_ignore(sym, true);
 		}
 
 		callchain_cursor_advance(cursor);
@@ -3035,7 +3035,7 @@ static size_t callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
 	list_for_each_entry(chain, &node->val, list) {
 		if (chain->ip >= PERF_CONTEXT_MAX)
 			continue;
-		if (chain->ms.sym && chain->ms.sym->ignore)
+		if (chain->ms.sym && symbol__ignore(chain->ms.sym))
 			continue;
 		ret += fprintf(fp, "%s%s", first ? "" : sep,
 			       callchain_list__sym_name(chain, bf, sizeof(bf),
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f6eb543de537..9a0c388a7ec3 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -186,8 +186,8 @@ static void ui__warn_map_erange(struct map *map, struct symbol *sym, u64 ip)
 		    "Please report to linux-kernel@vger.kernel.org\n",
 		    ip, dso__long_name(dso), dso__symtab_origin(dso),
 		    map__start(map), map__end(map), sym->start, sym->end,
-		    sym->binding == STB_GLOBAL ? 'g' :
-		    sym->binding == STB_LOCAL  ? 'l' : 'w', sym->name,
+		    symbol__binding(sym) == STB_GLOBAL ? 'g' :
+		    symbol__binding(sym) == STB_LOCAL  ? 'l' : 'w', sym->name,
 		    err ? "[unknown]" : uts.machine,
 		    err ? "[unknown]" : uts.release, perf_version_string);
 	if (use_browser <= 0)
@@ -830,7 +830,7 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !al.sym->idle) {
+	if (al.sym == NULL || !symbol__is_idle(al.sym)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/tests/symbols.c b/tools/perf/tests/symbols.c
index f4ffe5804f40..c09e04f36035 100644
--- a/tools/perf/tests/symbols.c
+++ b/tools/perf/tests/symbols.c
@@ -125,7 +125,7 @@ static int test_dso(struct dso *dso)
 	for (nd = rb_first_cached(dso__symbols(dso)); nd; nd = rb_next(nd)) {
 		struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
 
-		if (sym->type != STT_FUNC && sym->type != STT_GNU_IFUNC)
+		if (symbol__type(sym) != STT_FUNC && symbol__type(sym) != STT_GNU_IFUNC)
 			continue;
 
 		/* Check for overlapping function symbols */
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 524d46478364..7409abe4aa36 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -346,7 +346,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 			 * such as __indirect_thunk_end.
 			 */
 			continue;
-		} else if (is_ignored_symbol(sym->name, sym->type)) {
+		} else if (is_ignored_symbol(sym->name, symbol__type(sym))) {
 			/*
 			 * Ignore hidden symbols, see scripts/kallsyms.c for the details
 			 */
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index ea17e6d29a7e..e220c4dfc881 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1185,7 +1185,7 @@ int __hist_entry__tui_annotate(struct hist_entry *he, struct map_symbol *ms,
 	if (dso__annotate_warned(dso))
 		return -1;
 
-	if (not_annotated || !sym->annotate2) {
+	if (not_annotated || !symbol__is_annotate2(sym)) {
 		err = symbol__annotate2(ms, evsel, &browser.arch);
 		if (err) {
 			annotate_browser__symbol_annotate_error(&browser, err);
diff --git a/tools/perf/ui/browsers/map.c b/tools/perf/ui/browsers/map.c
index c61ba3174a24..075a575cdc5d 100644
--- a/tools/perf/ui/browsers/map.c
+++ b/tools/perf/ui/browsers/map.c
@@ -32,8 +32,8 @@ static void map_browser__write(struct ui_browser *browser, void *nd, int row)
 	ui_browser__set_percent_color(browser, 0, current_entry);
 	ui_browser__printf(browser, "%*" PRIx64 " %*" PRIx64 " %c ",
 			   mb->addrlen, sym->start, mb->addrlen, sym->end,
-			   sym->binding == STB_GLOBAL ? 'g' :
-				sym->binding == STB_LOCAL  ? 'l' : 'w');
+			   symbol__binding(sym) == STB_GLOBAL ? 'g' :
+				symbol__binding(sym) == STB_LOCAL  ? 'l' : 'w');
 	width = browser->width - ((mb->addrlen * 2) + 4);
 	if (width > 0)
 		ui_browser__write_nstring(browser, sym->name, width);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e745f3034a0e..2ecb514888ba 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -235,7 +235,8 @@ static int __symbol__inc_addr_samples(struct map_symbol *ms,
 	h = annotated_source__histogram(src, evsel);
 	if (h == NULL) {
 		pr_debug("%s(%d): ENOMEM! sym->name=%s, start=%#" PRIx64 ", addr=%#" PRIx64 ", end=%#" PRIx64 ", func: %d\n",
-			 __func__, __LINE__, sym->name, sym->start, addr, sym->end, sym->type == STT_FUNC);
+			 __func__, __LINE__, sym->name, sym->start, addr, sym->end,
+			 symbol__type(sym) == STT_FUNC);
 		return -ENOMEM;
 	}
 
@@ -2224,7 +2225,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 
 	annotation__init_column_widths(notes, sym);
 	annotation__update_column_widths(notes);
-	sym->annotate2 = 1;
+	symbol__set_annotate2(sym, true);
 
 	return 0;
 }
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a224687ffbc1..afcdefe95fee 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -2663,7 +2663,7 @@ static bool dso_sym_match(struct symbol *sym, const char *name, int *cnt,
 {
 	/* Same name, and global or the n'th found or any */
 	return !arch__compare_symbol_names(name, sym->name) &&
-	       ((!idx && sym->binding == STB_GLOBAL) ||
+	       ((!idx && symbol__binding(sym) == STB_GLOBAL) ||
 		(idx > 0 && ++*cnt == idx) ||
 		idx < 0);
 }
@@ -2681,8 +2681,8 @@ static void print_duplicate_syms(struct dso *dso, const char *sym_name)
 		if (dso_sym_match(sym, sym_name, &cnt, -1)) {
 			pr_err("#%d\t0x%"PRIx64"\t%c\t%s\n",
 			       ++cnt, sym->start,
-			       sym->binding == STB_GLOBAL ? 'g' :
-			       sym->binding == STB_LOCAL  ? 'l' : 'w',
+			       symbol__binding(sym) == STB_GLOBAL ? 'g' :
+			       symbol__binding(sym) == STB_LOCAL  ? 'l' : 'w',
 			       sym->name);
 			near = true;
 		} else if (near) {
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index f031cbbeeba8..9a107f42acdd 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -801,7 +801,7 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			 * symbol start. Otherwise do a faster comparison based
 			 * on the symbol start address.
 			 */
-			if (cnode->ms.sym->inlined || node->ms.sym->inlined) {
+			if (symbol__inlined(cnode->ms.sym) || symbol__inlined(node->ms.sym)) {
 				match = match_chain_strings(cnode->ms.sym->name,
 							    node->ms.sym->name);
 				if (match != MATCH_ERROR)
@@ -1245,7 +1245,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
-		const char *inlined = cl->ms.sym->inlined ? " (inlined)" : "";
+		const char *inlined = symbol__inlined(cl->ms.sym) ? " (inlined)" : "";
 
 		if (show_srcline && cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s%s",
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index dc31b5e7149e..e11e144af62b 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -56,7 +56,7 @@ static void al_to_d_al(struct addr_location *al, struct perf_dlfilter_al *d_al)
 			d_al->symoff = al->addr - map__start(al->map) - sym->start;
 		else
 			d_al->symoff = 0;
-		d_al->sym_binding = sym->binding;
+		d_al->sym_binding = symbol__binding(sym);
 	} else {
 		d_al->sym = NULL;
 		d_al->sym_start = 0;
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 5521d00bff2c..0f7a25500a44 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -146,7 +146,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 			sym = node->ms.sym;
 			map = node->ms.map;
 
-			if (sym && sym->ignore && print_skip_ignored)
+			if (sym && symbol__ignore(sym) && print_skip_ignored)
 				goto next;
 
 			printed += fprintf(fp, "%-*.*s", left_alignment, left_alignment, " ");
@@ -182,7 +182,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 				addr_location__exit(&node_al);
 			}
 
-			if (print_dso && (!sym || !sym->inlined))
+			if (print_dso && (!sym || !symbol__inlined(sym)))
 				printed += map__fprintf_dsoname_dsoff(map, print_dsoff, addr, fp);
 
 			if (print_srcline) {
@@ -192,7 +192,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 					printed += map__fprintf_srcline(map, addr, "\n  ", fp);
 			}
 
-			if (sym && sym->inlined)
+			if (sym && symbol__inlined(sym))
 				printed += fprintf(fp, " (inlined)");
 
 			if (!print_oneline)
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..6a405a9d829c 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2979,7 +2979,7 @@ static u64 intel_pt_switch_ip(struct intel_pt *pt, u64 *ptss_ip)
 	start = dso__first_symbol(map__dso(map));
 
 	for (sym = start; sym; sym = dso__next_symbol(sym)) {
-		if (sym->binding == STB_GLOBAL &&
+		if (symbol__binding(sym) == STB_GLOBAL &&
 		    !strcmp(sym->name, "__switch_to")) {
 			ip = map__unmap_ip(map, sym->start);
 			if (ip >= map__start(map) && ip < map__end(map)) {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6d32d3cb5cb7..7e38dde160b7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1078,7 +1078,7 @@ static u64 find_entry_trampoline(struct dso *dso)
 	unsigned int i;
 
 	for (; sym; sym = dso__next_symbol(sym)) {
-		if (sym->binding != STB_GLOBAL)
+		if (symbol__binding(sym) != STB_GLOBAL)
 			continue;
 		for (i = 0; i < ARRAY_SIZE(syms); i++) {
 			if (!strcmp(sym->name, syms[i]))
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 34b4badd2c14..11ae4a09412c 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -416,7 +416,7 @@ static int find_alternative_probe_point(struct debuginfo *dinfo,
 	map__for_each_symbol_by_name(map, pp->function, sym, idx) {
 		if (uprobes) {
 			address = sym->start;
-			if (sym->type == STT_GNU_IFUNC)
+			if (symbol__type(sym) == STT_GNU_IFUNC)
 				pr_warning("Warning: The probe function (%s) is a GNU indirect function.\n"
 					   "Consider identifying the final function used at run time and set the probe directly on that.\n",
 					   pp->function);
@@ -3189,7 +3189,7 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev,
 	for (j = 0; j < num_matched_functions; j++) {
 		sym = syms[j];
 
-		if (sym->type != STT_FUNC)
+		if (symbol__type(sym) != STT_FUNC)
 			continue;
 
 		/* There can be duplicated symbols in the map */
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index e261a57b87d4..3e0490030ddd 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -304,7 +304,7 @@ static SV *perl_process_callchain(struct perf_sample *sample,
 			}
 			if (!hv_stores(sym, "start",   newSVuv(node->ms.sym->start)) ||
 			    !hv_stores(sym, "end",     newSVuv(node->ms.sym->end)) ||
-			    !hv_stores(sym, "binding", newSVuv(node->ms.sym->binding)) ||
+			    !hv_stores(sym, "binding", newSVuv(symbol__binding(node->ms.sym))) ||
 			    !hv_stores(sym, "name",    newSVpvn(node->ms.sym->name,
 								node->ms.sym->namelen)) ||
 			    !hv_stores(elem, "sym",    newRV_noinc((SV*)sym))) {
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 5a30caaec73e..9d62a0921aee 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -437,7 +437,7 @@ static PyObject *python_process_callchain(struct perf_sample *sample,
 			pydict_set_item_string_decref(pysym, "end",
 					PyLong_FromUnsignedLongLong(node->ms.sym->end));
 			pydict_set_item_string_decref(pysym, "binding",
-					_PyLong_FromLong(node->ms.sym->binding));
+					_PyLong_FromLong(symbol__binding(node->ms.sym)));
 			pydict_set_item_string_decref(pysym, "name",
 					_PyUnicode_FromStringAndSize(node->ms.sym->name,
 							node->ms.sym->namelen));
@@ -1275,7 +1275,7 @@ static int python_export_symbol(struct db_export *dbe, struct symbol *sym,
 	tuple_set_d64(t, 1, dso__db_id(dso));
 	tuple_set_d64(t, 2, sym->start);
 	tuple_set_d64(t, 3, sym->end);
-	tuple_set_s32(t, 4, sym->binding);
+	tuple_set_s32(t, 4, symbol__binding(sym));
 	tuple_set_string(t, 5, sym->name);
 
 	call_object(tables->symbol_handler, t, "symbol_table");
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 90bc4a31bb55..005e7d85dc4a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -469,7 +469,7 @@ int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 	if (sym_l == sym_r)
 		return 0;
 
-	if (sym_l->inlined || sym_r->inlined) {
+	if (symbol__inlined(sym_l) || symbol__inlined(sym_r)) {
 		int ret = strcmp(sym_l->name, sym_r->name);
 
 		if (ret)
@@ -536,7 +536,7 @@ static int _hist_entry__sym_snprintf(struct map_symbol *ms,
 
 	ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", level);
 	if (sym && map) {
-		if (sym->type == STT_OBJECT) {
+		if (symbol__type(sym) == STT_OBJECT) {
 			ret += repsep_snprintf(bf + ret, size - ret, "%s", sym->name);
 			ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
 					ip - map__unmap_ip(map, sym->start));
@@ -544,7 +544,7 @@ static int _hist_entry__sym_snprintf(struct map_symbol *ms,
 			ret += repsep_snprintf(bf + ret, size - ret, "%.*s",
 					       width - ret,
 					       sym->name);
-			if (sym->inlined)
+			if (symbol__inlined(sym))
 				ret += repsep_snprintf(bf + ret, size - ret,
 						       " (inlined)");
 		}
@@ -1483,7 +1483,7 @@ static int _hist_entry__addr_snprintf(struct map_symbol *ms,
 
 	ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", level);
 	if (sym && map) {
-		if (sym->type == STT_OBJECT) {
+		if (symbol__type(sym) == STT_OBJECT) {
 			ret += repsep_snprintf(bf + ret, size - ret, "%s", sym->name);
 			ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
 					ip - map__unmap_ip(map, sym->start));
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index db164d258163..877d4889cd0d 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -113,16 +113,16 @@ struct symbol *new_inline_sym(struct dso *dso,
 		/* ensure that we don't alias an inlined symbol, which could
 		 * lead to double frees in inline_node__delete
 		 */
-		assert(!base_sym->inlined);
+		assert(!symbol__inlined(base_sym));
 	} else {
 		/* create a fake symbol for the inline frame */
 		inline_sym = symbol__new(base_sym ? base_sym->start : 0,
 					 base_sym ? (base_sym->end - base_sym->start) : 0,
-					 base_sym ? base_sym->binding : 0,
-					 base_sym ? base_sym->type : 0,
+					 base_sym ? symbol__binding(base_sym) : 0,
+					 base_sym ? symbol__type(base_sym) : 0,
 					 funcname);
 		if (inline_sym)
-			inline_sym->inlined = 1;
+			symbol__set_inlined(inline_sym, true);
 	}
 
 	free(demangled);
@@ -437,7 +437,7 @@ void inline_node__delete(struct inline_node *node)
 		list_del_init(&ilist->list);
 		zfree_srcline(&ilist->srcline);
 		/* only the inlined symbols are owned by the list */
-		if (ilist->symbol && ilist->symbol->inlined)
+		if (ilist->symbol && symbol__inlined(ilist->symbol))
 			symbol__delete(ilist->symbol);
 		free(ilist);
 	}
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 7afa8a117139..a9045d6fcb95 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -350,7 +350,8 @@ static bool get_ifunc_name(Elf *elf, struct dso *dso, GElf_Ehdr *ehdr,
 	sym = dso__find_symbol_nocache(dso, addr);
 
 	/* Expecting the address to be an IFUNC or IFUNC alias */
-	if (!sym || sym->start != addr || (sym->type != STT_GNU_IFUNC && !sym->ifunc_alias))
+	if (!sym || sym->start != addr ||
+	    (symbol__type(sym) != STT_GNU_IFUNC && !symbol__ifunc_alias(sym)))
 		return false;
 
 	snprintf(buf, buf_sz, "%s@plt", sym->name);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index fabed5b0fa57..4702b8989354 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -50,7 +50,7 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__is_idle(const char *name);
+static bool symbol__compute_is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -163,24 +163,24 @@ static int choose_best_symbol(struct symbol *syma, struct symbol *symb)
 	else if ((a == 0) && (b > 0))
 		return SYMBOL_B;
 
-	if (syma->type != symb->type) {
-		if (syma->type == STT_NOTYPE)
+	if (symbol__type(syma) != symbol__type(symb)) {
+		if (symbol__type(syma) == STT_NOTYPE)
 			return SYMBOL_B;
-		if (symb->type == STT_NOTYPE)
+		if (symbol__type(symb) == STT_NOTYPE)
 			return SYMBOL_A;
 	}
 
 	/* Prefer a non weak symbol over a weak one */
-	a = syma->binding == STB_WEAK;
-	b = symb->binding == STB_WEAK;
+	a = symbol__binding(syma) == STB_WEAK;
+	b = symbol__binding(symb) == STB_WEAK;
 	if (b && !a)
 		return SYMBOL_A;
 	if (a && !b)
 		return SYMBOL_B;
 
 	/* Prefer a global symbol over a non global one */
-	a = syma->binding == STB_GLOBAL;
-	b = symb->binding == STB_GLOBAL;
+	a = symbol__binding(syma) == STB_GLOBAL;
+	b = symbol__binding(symb) == STB_GLOBAL;
 	if (a && !b)
 		return SYMBOL_A;
 	if (b && !a)
@@ -227,14 +227,14 @@ void symbols__fixup_duplicate(struct rb_root_cached *symbols)
 			continue;
 
 		if (choose_best_symbol(curr, next) == SYMBOL_A) {
-			if (next->type == STT_GNU_IFUNC)
-				curr->ifunc_alias = true;
+			if (symbol__type(next) == STT_GNU_IFUNC)
+				symbol__set_ifunc_alias(curr, true);
 			rb_erase_cached(&next->rb_node, symbols);
 			symbol__delete(next);
 			goto again;
 		} else {
-			if (curr->type == STT_GNU_IFUNC)
-				next->ifunc_alias = true;
+			if (symbol__type(curr) == STT_GNU_IFUNC)
+				symbol__set_ifunc_alias(next, true);
 			nd = rb_next(&curr->rb_node);
 			rb_erase_cached(&curr->rb_node, symbols);
 			symbol__delete(curr);
@@ -322,8 +322,8 @@ struct symbol *symbol__new(u64 start, u64 len, u8 binding, u8 type, const char *
 
 	sym->start   = start;
 	sym->end     = len ? start + len : start;
-	sym->type    = type;
-	sym->binding = binding;
+	atomic_init(&sym->flags, (type << SYMBOL_FLAG_TYPE_SHIFT) |
+				 (binding << SYMBOL_FLAG_BINDING_SHIFT));
 	sym->namelen = namelen - 1;
 
 	pr_debug4("%s: %s %#" PRIx64 "-%#" PRIx64 "\n",
@@ -345,6 +345,49 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
+void symbol__set_ignore(struct symbol *sym, bool ignore)
+{
+	if (ignore)
+		atomic_fetch_or(&sym->flags, SYMBOL_FLAG_IGNORE);
+	else
+		atomic_fetch_and(&sym->flags, ~SYMBOL_FLAG_IGNORE);
+}
+
+void symbol__set_annotate2(struct symbol *sym, bool annotate2)
+{
+	if (annotate2)
+		atomic_fetch_or(&sym->flags, SYMBOL_FLAG_ANNOTATE2);
+	else
+		atomic_fetch_and(&sym->flags, ~SYMBOL_FLAG_ANNOTATE2);
+}
+
+void symbol__set_inlined(struct symbol *sym, bool inlined)
+{
+	if (inlined)
+		atomic_fetch_or(&sym->flags, SYMBOL_FLAG_INLINED);
+	else
+		atomic_fetch_and(&sym->flags, ~SYMBOL_FLAG_INLINED);
+}
+
+void symbol__set_ifunc_alias(struct symbol *sym, bool ifunc_alias)
+{
+	if (ifunc_alias)
+		atomic_fetch_or(&sym->flags, SYMBOL_FLAG_IFUNC_ALIAS);
+	else
+		atomic_fetch_and(&sym->flags, ~SYMBOL_FLAG_IFUNC_ALIAS);
+}
+
+static void symbol__set_idle(struct symbol *sym, bool idle)
+{
+	uint16_t old_flags = atomic_load(&sym->flags);
+	uint16_t new_flags;
+	uint16_t idle_val = idle ? SYMBOL_IDLE__IDLE : SYMBOL_IDLE__NOT_IDLE;
+
+	do {
+		new_flags = old_flags & ~SYMBOL_FLAG_IDLE_MASK;
+		new_flags |= (idle_val << SYMBOL_FLAG_IDLE_SHIFT);
+	} while (!atomic_compare_exchange_weak(&sym->flags, &old_flags, new_flags));
+}
 void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
@@ -375,7 +418,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 		 */
 		if (name[0] == '.')
 			name++;
-		sym->idle = symbol__is_idle(name);
+		symbol__set_idle(sym, symbol__compute_is_idle(name));
 	}
 
 	while (*p != NULL) {
@@ -717,11 +760,19 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
+bool symbol__is_idle(const struct symbol *sym)
+{
+	uint16_t flags = atomic_load(&sym->flags);
+	uint16_t idle_val = (flags & SYMBOL_FLAG_IDLE_MASK) >> SYMBOL_FLAG_IDLE_SHIFT;
+
+	return idle_val == SYMBOL_IDLE__IDLE;
+}
+
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__is_idle(const char *name)
+static bool symbol__compute_is_idle(const char *name)
 {
 	const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
@@ -2492,6 +2543,7 @@ void symbol__exit(void)
 {
 	if (!symbol_conf.initialized)
 		return;
+
 	strlist__delete(symbol_conf.bt_stop_list);
 	strlist__delete(symbol_conf.sym_list);
 	strlist__delete(symbol_conf.dso_list);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index bd6eb90c8668..a199646f21f7 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -6,6 +6,7 @@
 #include <linux/refcount.h>
 #include <stdbool.h>
 #include <stdint.h>
+#include <stdatomic.h>
 #include <linux/list.h>
 #include <linux/rbtree.h>
 #include <stdio.h>
@@ -43,6 +44,23 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 			     GElf_Shdr *shp, const char *name, size_t *idx);
 #endif
 
+enum symbol_idle_kind {
+	SYMBOL_IDLE__UNKNOWN = 0,
+	SYMBOL_IDLE__NOT_IDLE = 1,
+	SYMBOL_IDLE__IDLE = 2,
+};
+
+#define SYMBOL_FLAG_TYPE_SHIFT      0
+#define SYMBOL_FLAG_TYPE_MASK       (0xF << SYMBOL_FLAG_TYPE_SHIFT)
+#define SYMBOL_FLAG_BINDING_SHIFT   4
+#define SYMBOL_FLAG_BINDING_MASK    (0xF << SYMBOL_FLAG_BINDING_SHIFT)
+#define SYMBOL_FLAG_IDLE_SHIFT      8
+#define SYMBOL_FLAG_IDLE_MASK       (0x3 << SYMBOL_FLAG_IDLE_SHIFT)
+#define SYMBOL_FLAG_IGNORE          (1 << 10)
+#define SYMBOL_FLAG_INLINED         (1 << 11)
+#define SYMBOL_FLAG_ANNOTATE2       (1 << 12)
+#define SYMBOL_FLAG_IFUNC_ALIAS     (1 << 13)
+
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -54,20 +72,7 @@ struct symbol {
 	u64		end;
 	/** Length of the string name. */
 	u16		namelen;
-	/** ELF symbol type as defined for st_info. E.g STT_OBJECT or STT_FUNC. */
-	u8		type:4;
-	/** ELF binding type as defined for st_info. E.g. STB_WEAK or STB_GLOBAL. */
-	u8		binding:4;
-	/** Set true for kernel symbols of idle routines. */
-	u8		idle:1;
-	/** Resolvable but tools ignore it (e.g. idle routines). */
-	u8		ignore:1;
-	/** Symbol for an inlined function. */
-	u8		inlined:1;
-	/** Has symbol__annotate2 been performed. */
-	u8		annotate2:1;
-	/** Symbol is an alias of an STT_GNU_IFUNC */
-	u8		ifunc_alias:1;
+	_Atomic uint16_t flags;
 	/** Architecture specific. Unused except on PPC where it holds st_other. */
 	u8		arch_sym;
 	/** The name of length namelen associated with the symbol. */
@@ -77,6 +82,43 @@ struct symbol {
 void symbol__delete(struct symbol *sym);
 void symbols__delete(struct rb_root_cached *symbols);
 
+static inline u8 symbol__type(const struct symbol *sym)
+{
+	return (atomic_load(&sym->flags) & SYMBOL_FLAG_TYPE_MASK) >> SYMBOL_FLAG_TYPE_SHIFT;
+}
+
+static inline u8 symbol__binding(const struct symbol *sym)
+{
+	return (atomic_load(&sym->flags) & SYMBOL_FLAG_BINDING_MASK) >> SYMBOL_FLAG_BINDING_SHIFT;
+}
+
+static inline bool symbol__ignore(const struct symbol *sym)
+{
+	return (atomic_load(&sym->flags) & SYMBOL_FLAG_IGNORE) != 0;
+}
+
+static inline bool symbol__inlined(const struct symbol *sym)
+{
+	return (atomic_load(&sym->flags) & SYMBOL_FLAG_INLINED) != 0;
+}
+
+static inline bool symbol__is_annotate2(const struct symbol *sym)
+{
+	return (atomic_load(&sym->flags) & SYMBOL_FLAG_ANNOTATE2) != 0;
+}
+
+static inline bool symbol__ifunc_alias(const struct symbol *sym)
+{
+	return (atomic_load(&sym->flags) & SYMBOL_FLAG_IFUNC_ALIAS) != 0;
+}
+
+bool symbol__is_idle(const struct symbol *sym);
+
+void symbol__set_ignore(struct symbol *sym, bool ignore);
+void symbol__set_annotate2(struct symbol *sym, bool annotate2);
+void symbol__set_inlined(struct symbol *sym, bool inlined);
+void symbol__set_ifunc_alias(struct symbol *sym, bool ifunc_alias);
+
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
  * @symbols: the rb_root of symbols
diff --git a/tools/perf/util/symbol_fprintf.c b/tools/perf/util/symbol_fprintf.c
index 53e1af4ed9ac..4dc8d5761f52 100644
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
@@ -11,8 +11,8 @@ size_t symbol__fprintf(struct symbol *sym, FILE *fp)
 {
 	return fprintf(fp, " %" PRIx64 "-%" PRIx64 " %c %s\n",
 		       sym->start, sym->end,
-		       sym->binding == STB_GLOBAL ? 'g' :
-		       sym->binding == STB_LOCAL  ? 'l' : 'w',
+		       symbol__binding(sym) == STB_GLOBAL ? 'g' :
+		       symbol__binding(sym) == STB_LOCAL  ? 'l' : 'w',
 		       sym->name);
 }
 
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v9 18/18] perf symbol: Lazily compute idle
  2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
                                               ` (16 preceding siblings ...)
  2026-05-03  0:22                             ` [PATCH v9 17/18] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-03  0:22                             ` Ian Rogers
  2026-05-03  1:11                               ` sashiko-bot
  17 siblings, 1 reply; 106+ messages in thread
From: Ian Rogers @ 2026-05-03  0:22 UTC (permalink / raw)
  To: irogers, acme, namhyung, tmricht
  Cc: agordeev, gor, hca, jameshongleiwang, japo, linux-kernel,
	linux-perf-users, linux-s390, sumanthk

Switch from an idle boolean to a helper symbol__is_idle function. In
the function lazily compute whether a symbol is an idle function
taking into consideration the kernel version and architecture of the
machine. As symbols__insert no longer needs to know if a symbol is for
the kernel, remove the argument.

This change is inspired by mailing list discussion, particularly from
Thomas Richter <tmricht@linux.ibm.com> and Heiko Carstens
<hca@linux.ibm.com>:
https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/

The change switches x86 matches to use strstarts which means
intel_idle_irq is matched as part of strstarts(name, "intel_idle"), a
change suggested by Honglei Wang <jameshongleiwang@126.com> in:
https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-top.c     |   3 +-
 tools/perf/util/symbol-elf.c |   2 +-
 tools/perf/util/symbol.c     | 114 +++++++++++++++++++++--------------
 tools/perf/util/symbol.h     |   8 +--
 4 files changed, 74 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 9a0c388a7ec3..efb4b1172190 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -830,7 +830,8 @@ static void perf_event__process_sample(const struct perf_tool *tool,
 		}
 	}
 
-	if (al.sym == NULL || !symbol__is_idle(al.sym)) {
+	if (al.sym == NULL ||
+	    !symbol__is_idle(al.sym, al.map ? map__dso(al.map) : NULL, machine->env)) {
 		struct hists *hists = evsel__hists(evsel);
 		struct hist_entry_iter iter = {
 			.evsel		= evsel,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index a9045d6fcb95..69484abc07b6 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1728,7 +1728,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
+		__symbols__insert(dso__symbols(curr_dso), f);
 		nr++;
 	}
 	dso__put(curr_dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 4702b8989354..2caa6b8b8609 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -50,7 +50,6 @@
 
 static int dso__load_kernel_sym(struct dso *dso, struct map *map);
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map);
-static bool symbol__compute_is_idle(const char *name);
 
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
@@ -401,8 +400,7 @@ void symbols__delete(struct rb_root_cached *symbols)
 	}
 }
 
-void __symbols__insert(struct rb_root_cached *symbols,
-		       struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
@@ -410,17 +408,6 @@ void __symbols__insert(struct rb_root_cached *symbols,
 	struct symbol *s;
 	bool leftmost = true;
 
-	if (kernel) {
-		const char *name = sym->name;
-		/*
-		 * ppc64 uses function descriptors and appends a '.' to the
-		 * start of every instruction address. Remove it.
-		 */
-		if (name[0] == '.')
-			name++;
-		symbol__set_idle(sym, symbol__compute_is_idle(name));
-	}
-
 	while (*p != NULL) {
 		parent = *p;
 		s = rb_entry(parent, struct symbol, rb_node);
@@ -437,7 +424,7 @@ void __symbols__insert(struct rb_root_cached *symbols,
 
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	__symbols__insert(symbols, sym, false);
+	__symbols__insert(symbols, sym);
 }
 
 static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
@@ -598,7 +585,7 @@ void dso__reset_find_symbol_cache(struct dso *dso)
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
+	__symbols__insert(dso__symbols(dso), sym);
 
 	/* update the symbol cache if necessary */
 	if (dso__last_find_result_addr(dso) >= sym->start &&
@@ -760,55 +747,90 @@ int modules__parse(const char *filename, void *arg,
 	return err;
 }
 
-bool symbol__is_idle(const struct symbol *sym)
-{
-	uint16_t flags = atomic_load(&sym->flags);
-	uint16_t idle_val = (flags & SYMBOL_FLAG_IDLE_MASK) >> SYMBOL_FLAG_IDLE_SHIFT;
-
-	return idle_val == SYMBOL_IDLE__IDLE;
-}
-
 /*
  * These are symbols in the kernel image, so make sure that
  * sym is from a kernel DSO.
  */
-static bool symbol__compute_is_idle(const char *name)
+static int sym_name_cmp(const void *a, const void *b)
+{
+	const char *name = a;
+	const char *const *sym = b;
+
+	return strcmp(name, *sym);
+}
+
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
 {
-	const char * const idle_symbols[] = {
+	static const char * const idle_symbols[] = {
 		"acpi_idle_do_entry",
 		"acpi_processor_ffh_cstate_enter",
 		"arch_cpu_idle",
 		"cpu_idle",
 		"cpu_startup_entry",
-		"idle_cpu",
-		"intel_idle",
-		"intel_idle_ibrs",
 		"default_idle",
-		"native_safe_halt",
 		"enter_idle",
 		"exit_idle",
-		"mwait_idle",
-		"mwait_idle_with_hints",
-		"mwait_idle_with_hints.constprop.0",
+		"idle_cpu",
+		"native_safe_halt",
 		"poll_idle",
-		"ppc64_runlatch_off",
 		"pseries_dedicated_idle_sleep",
-		"psw_idle",
-		"psw_idle_exit",
-		NULL
 	};
-	int i;
-	static struct strlist *idle_symbols_list;
+	const char *name = sym->name;
+	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
+
+	{
+		uint16_t flags = atomic_load(&sym->flags);
+		uint16_t idle_val = (flags & SYMBOL_FLAG_IDLE_MASK) >> SYMBOL_FLAG_IDLE_SHIFT;
+
+		if (idle_val != SYMBOL_IDLE__UNKNOWN)
+			return idle_val == SYMBOL_IDLE__IDLE;
+	}
+
+	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
+		symbol__set_idle(sym, /*idle=*/false);
+		return false;
+	}
+
+	/*
+	 * ppc64 uses function descriptors and appends a '.' to the
+	 * start of every instruction address. Remove it.
+	 */
+	if (name[0] == '.')
+		name++;
+
+	if (bsearch(name, idle_symbols, ARRAY_SIZE(idle_symbols),
+		    sizeof(idle_symbols[0]), sym_name_cmp)) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
 
-	if (idle_symbols_list)
-		return strlist__has_entry(idle_symbols_list, name);
+	if (e_machine == EM_386 || e_machine == EM_X86_64) {
+		if (strstarts(name, "mwait_idle") ||
+		    strstarts(name, "intel_idle")) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
 
-	idle_symbols_list = strlist__new(NULL, NULL);
+	if (e_machine == EM_PPC64 && !strcmp(name, "ppc64_runlatch_off")) {
+		symbol__set_idle(sym, /*idle=*/true);
+		return true;
+	}
 
-	for (i = 0; idle_symbols[i]; i++)
-		strlist__add(idle_symbols_list, idle_symbols[i]);
+	if (e_machine == EM_S390 && strstarts(name, "psw_idle")) {
+		int major = 0, minor = 0;
+		const char *release = perf_env__os_release(env);
 
-	return strlist__has_entry(idle_symbols_list, name);
+		/* Before v6.10, s390 used psw_idle. */
+		if (release && sscanf(release, "%d.%d", &major, &minor) == 2 &&
+		    (major < 6 || (major == 6 && minor < 10))) {
+			symbol__set_idle(sym, /*idle=*/true);
+			return true;
+		}
+	}
+
+	symbol__set_idle(sym, /*idle=*/false);
+	return false;
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
@@ -837,7 +859,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 	 * We will pass the symbols to the filter later, in
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
-	__symbols__insert(root, sym, !strchr(name, '['));
+	__symbols__insert(root, sym);
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index a199646f21f7..422e98a4ea2f 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -28,6 +28,7 @@ struct map;
 struct maps;
 struct option;
 struct build_id;
+struct perf_env;
 
 /*
  * libelf 0.8.x and earlier do not support ELF_C_READ_MMAP;
@@ -60,7 +61,6 @@ enum symbol_idle_kind {
 #define SYMBOL_FLAG_INLINED         (1 << 11)
 #define SYMBOL_FLAG_ANNOTATE2       (1 << 12)
 #define SYMBOL_FLAG_IFUNC_ALIAS     (1 << 13)
-
 /**
  * A symtab entry. When allocated this may be preceded by an annotation (see
  * symbol__annotation) and/or a browser_index (see symbol__browser_index).
@@ -112,7 +112,7 @@ static inline bool symbol__ifunc_alias(const struct symbol *sym)
 	return (atomic_load(&sym->flags) & SYMBOL_FLAG_IFUNC_ALIAS) != 0;
 }
 
-bool symbol__is_idle(const struct symbol *sym);
+bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env);
 
 void symbol__set_ignore(struct symbol *sym, bool ignore);
 void symbol__set_annotate2(struct symbol *sym, bool annotate2);
@@ -196,7 +196,6 @@ int filename__read_debuglink(const char *filename, char *debuglink,
 			     size_t size);
 bool filename__has_section(const char *filename, const char *sec);
 
-struct perf_env;
 int symbol__init(struct perf_env *env);
 void symbol__exit(void);
 void symbol__elf_init(void);
@@ -236,8 +235,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
-		       bool kernel);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 10/18] perf arch common: Use perf_env e_machine rather than arch
  2026-05-03  0:22                             ` [PATCH v9 10/18] perf arch common: " Ian Rogers
@ 2026-05-03  0:38                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  0:38 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Native objdump fallback breaks for cross-bitness profiling (e.g., analyzing a 32-bit profile on a 64-bit host).
- [Medium] Unintended address space handling change for 32-bit SPARC.
--

commit ba683ab91ed0ab22877903f737b0057971f914b1
Author: Ian Rogers <irogers@google.com>

perf arch common: Use perf_env e_machine rather than arch

This commit updates the perf architecture matching logic to use e_machine
values instead of architecture strings.

> diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
> index 21836f70f231..1d8aff9b32d6 100644
> --- a/tools/perf/arch/common.c
> +++ b/tools/perf/arch/common.c
[ ... ]
> @@ -145,7 +150,8 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
[ ... ]
>  	/*
>  	 * We don't need to try to find objdump path for native system.
>  	 * Just use default binutils path (e.g.: "objdump").
>  	 */
> -	if (!strcmp(perf_env__arch(NULL), arch))
> +	if (e_machine == EM_HOST)
>  		goto out;

Does this change break cross-bitness profiling on the same architecture
family?

Previously, perf_env__arch() normalized both 32-bit (EM_386) and 64-bit
(EM_X86_64) ELF machine types into a single family string like x86. This
allowed a 64-bit host to fall back to the native objdump when analyzing a
32-bit profile.

With the new check, EM_386 is not equal to EM_X86_64 for a 32-bit profile
on a 64-bit host, causing the check to fail. The code then attempts a triplet
lookup, such as i686-pc-linux-gnu-objdump. 

If these specific prefixed symlinks are not installed, will this trigger a
fatal ui__error and abort perf annotate, even though the native objdump
inherently supports multilib and 32-bit binaries?

[ ... ]
> @@ -237,5 +255,5 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
>   */
>  bool perf_env__single_address_space(struct perf_env *env)
>  {
> -	return strcmp(perf_env__arch(env), "sparc");
> +	return perf_env__e_machine(env, /*e_flags=*/NULL) != EM_SPARCV9;
>  }

Is this an intended logic change for 32-bit SPARC?

Before this patch, perf_env__arch() mapped both 32-bit (EM_SPARC) and
64-bit (EM_SPARCV9) to sparc, so the strcmp returned 0 (false) for both
variants. 

Now, the function evaluates to false for 64-bit SPARC (EM_SPARCV9) but
evaluates to true for 32-bit SPARC (EM_SPARC).

Could this unexpectedly alter address space separation for 32-bit SPARC
profiles?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=10

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 03/18] perf env, dso, thread: Add _endian variants for e_machine helpers
  2026-05-03  0:22                             ` [PATCH v9 03/18] perf env, dso, thread: Add _endian variants for e_machine helpers Ian Rogers
@ 2026-05-03  0:39                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  0:39 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] thread__e_machine_endian() fast-path returns successfully without initializing the is_big_endian output parameter.
- [High] dso__e_machine_endian() bypasses lazy evaluation of env->arch, silently falling back to the host's endianness.
- [High] perf_arch_is_big_endian() falsely categorizes standard big-endian architectures (e.g., MIPS, m68k) as little-endian.
- [High] read_proc_e_machine_for_pid() uses a blocking open() on user-controlled /proc paths, risking an indefinite hang.
--

commit 34bedc0fb8b78f200595dbcc97e6be1af183988e
Author: Ian Rogers <irogers@google.com>

perf env, dso, thread: Add _endian variants for e_machine helpers

Add perf_arch_is_big_endian(), dso__read_e_machine_endian(),
dso__e_machine_endian(), and thread__e_machine_endian() to support
bi-endianness and cross-architecture analysis without breaking the
existing API.

> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index b791e1b6b2cf..6439b2a3c898 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
[ ... ]
> @@ -1308,9 +1313,11 @@ uint16_t dso__e_machine(struct dso *dso, struct machine *machine, uint32_t *e_fl
>  	case DSO_BINARY_TYPE__BPF_IMAGE:
>  	case DSO_BINARY_TYPE__OOL:
>  	case DSO_BINARY_TYPE__JAVA_JIT:
> -		if (e_flags)
> -			*e_flags = EF_HOST;
> -		return EM_HOST;
> +		if (is_big_endian) {
> +			*is_big_endian = perf_arch_is_big_endian(
> +				machine && machine->env ? machine->env->arch : NULL);

Could this direct access to machine->env->arch read a NULL pointer if the
architecture hasn't been lazily computed yet?

The arch string inside struct perf_env is lazily populated via
perf_env__arch(env). If NULL is passed to perf_arch_is_big_endian() here, it
will default to the host's byte order.

> +		}
> +		return perf_env__e_machine(machine ? machine->env : NULL, e_flags);
>  	case DSO_BINARY_TYPE__DEBUGLINK:
[ ... ]
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 0edc67a468ab..1a4db133262b 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
[ ... ]
> @@ -339,6 +339,18 @@ int perf_env__kernel_is_64_bit(struct perf_env *env)
>  	return env->kernel_is_64_bit;
>  }
>  
> +bool perf_arch_is_big_endian(const char *arch)
> +{
> +	if (!arch)
> +		return __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__;
> +
> +	if (str_ends_with(arch, "_be") || !strcmp(arch, "sparc") || !strcmp(arch, "sparc64") ||
> +	    !strcmp(arch, "s390") || !strcmp(arch, "s390x") || !strcmp(arch, "powerpc"))
> +		return true;
> +
> +	return false;
> +}

Does this check omit standard big-endian architectures like mips, m68k, and
parisc?

Because they aren't in the allowlist, cross-architecture analysis on a MIPS
kernel might return false, causing downstream consumers to be incorrectly
configured in little-endian mode.

> +
>  int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
>  {
[ ... ]
> diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
> index 22be77225bb0..8611293deca9 100644
> --- a/tools/perf/util/thread.c
> +++ b/tools/perf/util/thread.c
[ ... ]
> @@ -449,7 +449,7 @@ void thread__find_cpumode_addr_location(struct thread *thread, u64 addr,
>  	}
>  }
>  
> -static uint16_t read_proc_e_machine_for_pid(pid_t pid, uint32_t *e_flags)
> +static uint16_t read_proc_e_machine_for_pid(pid_t pid, uint32_t *e_flags, bool *is_big_endian)
>  {
>  	char path[6 /* "/proc/" */ + 11 /* max length of pid */ + 5 /* "/exe\0" */];
>  	int fd;
> @@ -458,7 +458,8 @@ static uint16_t read_proc_e_machine_for_pid(pid_t pid, uint32_t *e_flags)
>  	snprintf(path, sizeof(path), "/proc/%d/exe", pid);
>  	fd = open(path, O_RDONLY);

Is it possible for this open() to hang indefinitely if the /proc/%d/exe path
points to a blocking FUSE filesystem or an unresponsive network mount?

Opening it without O_NONBLOCK risks hanging the perf daemon.

>  	if (fd >= 0) {
> -		e_machine = dso__read_e_machine(/*optional_dso=*/NULL, fd, e_flags);
> +		e_machine = dso__read_e_machine_endian(/*optional_dso=*/NULL, fd, e_flags,
> +						       is_big_endian);
>  		close(fd);
>  	}
[ ... ]
> @@ -491,6 +495,7 @@ uint16_t thread__e_machine(struct thread *thread, struct machine *machine, uint3
>  		.machine = machine,
>  		.e_flags = 0,
>  		.e_machine = EM_NONE,
> +		.is_big_endian = false,
>  	};
>  
>  	if (e_machine != EM_NONE) {

Does this fast-path return successfully without initializing the is_big_endian
output parameter?

Because struct thread does not cache the endianness, any caller relying on
is_big_endian might read uninitialized memory or a default value from their
stack.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=3

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 04/18] perf capstone: Determine architecture from e_machine
  2026-05-03  0:22                             ` [PATCH v9 04/18] perf capstone: Determine architecture from e_machine Ian Rogers
@ 2026-05-03  0:50                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  0:50 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Unconditional application of CS_MODE_64 and CS_MODE_32 breaks Capstone initialization for architectures like ARM, ARM64, and RISC-V.
- [High] Uninitialized `is_big_endian` on thread cache hits causes big-endian architectures to be disassembled as little-endian.
- [Medium] Lockless multi-variable data race when caching thread `e_machine` and `e_flags`.
--

commit 438f1b068410120794eb1080db889c82f269a06e
Author: Ian Rogers <irogers@google.com>

perf capstone: Determine architecture from e_machine

This commit updates Capstone initialization to use e_machine instead of arch
strings to determine the architecture. It also adds e_machine translations
for MIPS and RISCV, and removes unnecessary maybe_unused annotations.

[Note: The report format has been altered to provide a free-form summary of
the findings instead of inline diff quoting due to recitation restrictions.]

Regarding the implementation of e_machine_to_capstone(), is it intended to
unconditionally set CS_MODE_64 or CS_MODE_32 in the mode parameter for all
architectures?

Capstone defines CS_MODE_32 as 4 and CS_MODE_64 as 8. While these mode flags
are valid for architectures like x86 and MIPS, they might not be supported
for others. For instance, RISC-V uses CS_MODE_RISCV32 or CS_MODE_RISCV64,
and ARM expects CS_MODE_ARM.

If CS_MODE_64 or CS_MODE_32 is combined with the architecture-specific flags
for ARM, ARM64, or RISC-V, could it pass an invalid bitmask to Capstone and
cause cs_open() to fail with CS_ERR_MODE?

In functions like capstone__fprintf_insn_asm(), the is_big_endian variable
is initialized to false and passed by reference to thread__e_machine_endian().
If thread__e_machine_endian() finds that the architecture is already cached,
it takes an early return path.

Does this early return path bypass updating the is_big_endian out-parameter?
Since the thread cache only stores e_machine and e_flags, could this cause
big-endian architectures to be disassembled as little-endian on cache hits
because is_big_endian remains false?

Additionally, when thread__e_machine_endian() caches the architecture
details, it appears to update e_machine and e_flags sequentially without a
lock or memory barriers.

Could a concurrent thread observe the newly updated e_machine, take the early
return path, and read a stale or uninitialized e_flags value before the first
thread has completed its update?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=4

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 06/18] perf symbol: Avoid use of machine__is
  2026-05-03  0:22                             ` [PATCH v9 06/18] perf symbol: Avoid use of machine__is Ian Rogers
@ 2026-05-03  0:51                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  0:51 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch completely fails to read the ELF machine architecture from the DSO for kernel objects, introducing a cross-architecture analysis regression.
--

commit 01fba2f5fba1a6782bc236fbef6a297cb4c04daf
Author: Ian Rogers <irogers@google.com>

perf symbol: Avoid use of machine__is

This commit switches to using the ELF machine from the dso or running
machine rather than the machine perf_env arch to avoid falling back on
EM_HOST and imprecise string comparisons.

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index fcaeeddbbb6b..a4b1f837a5a5 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c

[ ... ]

> @@ -1428,7 +1444,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
>  		free(new_node);
>  	}
>  
> -	if (machine__is(machine, "x86_64")) {
> +	if (machine_or_dso_e_machine(machine, dso) == EM_X86_64) {

Does this actually avoid falling back to EM_HOST for kernel objects?

Looking at dso__e_machine_endian(), kernel DSOs (like DSO_BINARY_TYPE__VMLINUX,
KALLSYMS, and KCORE) skip reading the ELF header entirely and immediately
return perf_env__e_machine(machine->env).

Additionally, at this point in dso__load_kcore(), the DSO binary type has not
been set yet. It is set later in the function via dso__set_binary_type().
Because of this, it returns EM_NONE, causing machine_or_dso_e_machine() to
also fall back to perf_env__e_machine().

Since perf_env__e_machine_nocache() defaults to EM_HOST if the architecture is
missing, analyzing a non-x86 perf.data file missing an architecture string on
an x86_64 host will evaluate to EM_X86_64.

Previously, machine__is() used perf_env__raw_arch(), which safely returned
"unknown" if the architecture was missing, preventing incorrect matches.

Could this change incorrectly trigger x86_64-specific trampoline mappings for
non-x86 kernel objects?

>  		u64 addr;
>  
>  		/*
>  		 * If one of the corresponding symbols is there, assume the

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=6

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-03  0:22                             ` [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
@ 2026-05-03  0:52                               ` sashiko-bot
  2026-05-04  1:35                               ` Namhyung Kim
  1 sibling, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  0:52 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Backwards and forwards compatibility breakage via architecture normalization.
- [High] Architecture misclassification and data loss due to guaranteed EM_HOST fallback.
- [High] Premature caching of the fallback architecture disables thread-based evaluation.
- [High] Flawed 64-bit kernel detection for MIPS, RISC-V, and legacy endianness suffixes.
--

commit 745bee191a4ba7e972076983225158e66679b6ae
Author: Ian Rogers <irogers@google.com>

perf env: Add perf_env__e_machine helper and use in perf_env__arch

This commit introduces a helper to lazily compute the e_machine from the
perf environment and uses the session's e_machine to derive the architecture
string. It updates the data file header writing to fetch the e_machine and
arch from the session if available.

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 1e54e2c86360..0edc67a468ab 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
[ ... ]
> @@ -309,12 +311,21 @@ void perf_env__init(struct perf_env *env)
>  
>  static void perf_env__init_kernel_mode(struct perf_env *env)
>  {
> -	const char *arch = perf_env__raw_arch(env);
> +	uint16_t e_machine = env->e_machine;
>  
> -	if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
> -	    !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
> -	    !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
> -	    !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
> +	if (env->arch && (e_machine == EM_NONE || e_machine == EM_MIPS || e_machine == EM_RISCV)) {
> +		if (str_ends_with(env->arch, "64") || !strncmp(env->arch, "s390x", 5))
> +			env->kernel_is_64_bit = 1;
> +		else
> +			env->kernel_is_64_bit = 0;
> +		return;
> +	}

e_machine_to_perf_arch() returns base strings like "mips" or "riscv" without
a "64" suffix. Does this cause new 64-bit MIPS and RISC-V traces to be
misclassified as 32-bit since str_ends_with() will return false?

Additionally, older perf.data files may contain endianness suffixes like
"mips64el" which correctly matched the previous strncmp() logic. Will these
legacy files fail the new str_ends_with("64") check and corrupt 64-bit
traces?

[ ... ]
> @@ -588,51 +599,187 @@ void cpu_cache_level__free(struct cpu_cache_level *cache)
[ ... ]
> -static const char *normalize_arch(char *arch)
> -{
> -	if (!strcmp(arch, "x86_64"))
> -		return "x86";
> -	if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
> -		return "x86";
> -	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
> -		return "sparc";
> -	if (!strncmp(arch, "aarch64", 7) || !strncmp(arch, "arm64", 5))
> -		return "arm64";
> -	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
> -		return "arm";
> -	if (!strncmp(arch, "s390", 4))
> -		return "s390";
> -	if (!strncmp(arch, "parisc", 6))
> -		return "parisc";
> -	if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
> -		return "powerpc";
> -	if (!strncmp(arch, "mips", 4))
> -		return "mips";
> -	if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
> -		return "sh";
> -	if (!strncmp(arch, "loongarch", 9))
> -		return "loongarch";
> -
> -	return arch;
[ ... ]
>  const char *perf_env__arch(struct perf_env *env)
>  {
> -	char *arch_name;
> +	uint16_t e_machine;
> +	const char *arch;
>  
> -	if (!env || !env->arch) { /* Assume local operation */
> -		static struct utsname uts = { .machine[0] = '\0', };
> -		if (uts.machine[0] == '\0' && uname(&uts) < 0)
> -			return NULL;
> -		arch_name = uts.machine;
> -	} else
> -		arch_name = env->arch;
> +	if (!env)
> +		return e_machine_to_perf_arch(EM_HOST);
> +
> +	if (env->arch)
> +		return env->arch;
>  
> -	return normalize_arch(arch_name);
> +	/*
> +	 * Lazily compute/allocate arch. The e_machine may have been
> +	 * read from a data file and so may not be EM_HOST.
> +	 */
> +	e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);
> +	arch = e_machine_to_perf_arch(e_machine);
> +	env->arch = strdup(arch);
> +	/*
> +	 * Avoid potential crashes on the arch string if memory allocation in
> +	 * strdup fails and NULL were to be returned.
> +	 */
> +	return env->arch ?: arch;
>  }

The removal of normalize_arch() appears to break both backward and forward
compatibility.

When reading older perf.data files, env->arch is populated with the raw
string from the file (e.g., "x86_64"). Without normalize_arch(),
perf_env__arch() will return "x86_64", potentially breaking internal logic
that expects the normalized "x86".

Conversely, for new files, e_machine_to_perf_arch() generates normalized
names like "x86". Older versions of the perf tool expect the raw string for
64-bit detection:

perf_env__init_kernel_mode() {
    const char *arch = perf_env__raw_arch(env);

    if (!strncmp(arch, "x86_64", 6) || ...
}

Does this mean older perf binaries will incorrectly parse new 64-bit
perf.data files as 32-bit traces?

> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index f30e48eb3fc3..f1ae61392cce 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
[ ... ]
> -static int write_arch(struct feat_fd *ff,
> -		      struct evlist *evlist __maybe_unused)
> +static int write_arch(struct feat_fd *ff, struct evlist *evlist)
>  {
>  	struct utsname uts;
> -	int ret;
> +	const char *arch = NULL;
>  
> -	ret = uname(&uts);
> -	if (ret < 0)
> -		return -1;
> +	if (evlist->session) {
> +		/* Force the computation in the perf_env of the e_machine of the threads. */
> +		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
> +		arch = perf_env__arch(perf_session__env(evlist->session));
> +	}

Is it intended that perf_env__arch() permanently caches the fallback
architecture here?

perf_session__e_machine() avoids caching the fallback EM_HOST when no
threads are present, noting: "Don't cache in the perf_env in case later
threads indicate a better ELF machine type."

However, immediately afterward, perf_env__arch() calls perf_env__e_machine(),
which populates env->e_machine = EM_HOST. Because it gets cached, will all
subsequent evaluations for the session short-circuit and return EM_HOST,
ignoring the true thread architecture once threads are added?

>  
> -	return do_write_string(ff, uts.machine);
> +	if (!arch) {
> +		int ret = uname(&uts);
> +
> +		if (ret < 0)
> +			return -1;
> +		arch = uts.machine;
> +	}
> +	return do_write_string(ff, arch);
>  }

Can the if (!arch) block ever be reached?

perf_env__arch() is guaranteed to return a valid string (or "unknown"),
meaning arch will never be NULL. This skips the uname(&uts) fallback
entirely.

As a result, if it falls back to EM_HOST, won't a 32-bit perf binary
running on a 64-bit kernel incorrectly record a 32-bit system architecture?

Additionally, for system architectures missing from the prefix_to_e_machine
table (e.g., hexagon, openrisc), won't it permanently record "unknown" in
the header instead of the correct uname information?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=1

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 17/18] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues
  2026-05-03  0:22                             ` [PATCH v9 17/18] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
@ 2026-05-03  0:59                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  0:59 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Performance regression due to `atomic_load` enforcing strict sequential consistency in frequently called `struct symbol` getters.
--

commit f3d51ff8771487f327675b2cba0003cf43b91f8a
Author: Ian Rogers <irogers@google.com>

perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues

This commit addresses concurrent update issues by replacing bitfields in
struct symbol with a 16-bit atomic flags variable. It introduces atomic
accessors to read and modify these flags safely.

> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
[ ... ]
> @@ -77,6 +82,43 @@ struct symbol {
>  void symbol__delete(struct symbol *sym);
>  void symbols__delete(struct rb_root_cached *symbols);
>  
> +static inline u8 symbol__type(const struct symbol *sym)
> +{
> +	return (atomic_load(&sym->flags) & SYMBOL_FLAG_TYPE_MASK) >> SYMBOL_FLAG_TYPE_SHIFT;
> +}
> +
> +static inline u8 symbol__binding(const struct symbol *sym)
> +{
> +	return (atomic_load(&sym->flags) & SYMBOL_FLAG_BINDING_MASK) >> SYMBOL_FLAG_BINDING_SHIFT;
> +}
[ ... ]
> +static inline bool symbol__inlined(const struct symbol *sym)
> +{
> +	return (atomic_load(&sym->flags) & SYMBOL_FLAG_INLINED) != 0;
> +}

Could the use of atomic_load() here cause a performance regression on
weakly-ordered architectures?

By default, C11 atomic_load() implies memory_order_seq_cst. On architectures
like ARM64 or PowerPC, this generates memory barriers or load-acquire
instructions.

These getters are evaluated millions of times in extremely hot paths, such as
the qsort comparison function _sort__sym_cmp() in tools/perf/util/sort.c:

int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
{
        ...
        if (symbol__inlined(sym_l) || symbol__inlined(sym_r)) {
                int ret = strcmp(sym_l->name, sym_r->name);
        ...
}

Emitting strict memory ordering instructions inside sorting loops prevents CPU
pipeline instruction reordering, which might significantly slow down sample
processing during perf report and perf top.

Since the read operations only require atomicity to prevent tearing, would it
be better to use atomic_load_explicit(&sym->flags, memory_order_relaxed)
instead?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=17

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 07/18] perf machine: Use perf_env e_machine rather than arch
  2026-05-03  0:22                             ` [PATCH v9 07/18] perf machine: Use perf_env e_machine rather than arch Ian Rogers
@ 2026-05-03  1:00                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  1:00 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] machine__is() incorrectly falls back to the compile-time host architecture when the machine's environment is NULL, breaking non-x86_64 guest kernel memory map handling.
- [High] get_leaf_frame_caller() delegates to thread__e_machine(), which unconditionally falls back to the host architecture if maps fail, breaking offline cross-architecture callchain analysis.
--

commit e3337f1222c33fd5dc5c1fda6cc5b1ac0b73b28e
Author: Ian Rogers <irogers@google.com>

perf machine: Use perf_env e_machine rather than arch

This commit modifies perf's machine parsing to use ELF machine constants
instead of architecture strings. This is intended to provide more precise
comparisons by using the thread's e_machine and avoids issues with normalized
architecture strings.

> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index e76f8c86e62a..6d32d3cb5cb7 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1611,10 +1611,15 @@ static bool machine__uses_kcore(struct machine *machine)
>  	return dsos__for_each_dso(&machine->dsos, machine__uses_kcore_cb, NULL) != 0 ? true : false;
>  }
>  
> +static bool machine__is(struct machine *machine, uint16_t e_machine)
> +{
> +	return machine && perf_env__e_machine(machine->env, NULL) == e_machine;
> +}

Could this fallback to the host architecture when dealing with guest machines
that don't have their environment populated?

Guest machines initialized via machines__add do not have their machine->env
populated (machine->env == NULL). Previously, machine__is(..., "x86_64") used
perf_env__raw_arch(NULL), which returned "unknown", safely evaluating to
false.

The new implementation uses perf_env__e_machine(NULL, NULL), which evaluates
to perf_env__e_machine_nocache() and unconditionally returns EM_HOST (the
architecture the perf binary was compiled for).

When analyzing non-x86_64 guests on an x86_64 host, machine__is(guest,
EM_X86_64) evaluates to true.

[ ... ]

> @@ -2770,7 +2775,7 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>  static u64 get_leaf_frame_caller(struct perf_sample *sample,
>  		struct thread *thread, int usr_idx)
>  {
> -	if (machine__normalized_is(maps__machine(thread__maps(thread)), "arm64"))
> +	if (thread__e_machine(thread, /*machine=*/NULL, /*e_flags=*/NULL) == EM_AARCH64)
>  		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
>  	else
>  		return 0;

Will this break offline cross-architecture callchain analysis for AArch64?

Previously, get_leaf_frame_caller() used machine__normalized_is(), which
correctly checked the session's recorded perf_env (derived from the perf.data
header) to determine the architecture.

The new code relies on thread__e_machine() returning EM_AARCH64. If
thread__e_machine() cannot determine the architecture from the thread's
DSOs (for example, due to stripped binaries, kernel threads, or JIT code),
it falls back to read_proc_e_machine_for_pid().

Since offline perf.data files are not part of a live session, this bypasses
/proc reads and unconditionally defaults to EM_HOST (the compile-time
architecture of the perf binary). As a result, analyzing an AArch64 perf.data
file on an x86_64 host will silently fall back to EM_X86_64, bypassing
get_leaf_frame_caller_aarch64() and preventing leaf frame unwinding.

[ ... ]

> @@ -3181,7 +3172,7 @@ int machine__get_kernel_start(struct machine *machine)
>  		 * start of kernel text, but still above 2^63. So leave
>  		 * kernel_start = 1ULL << 63 for x86_64.
>  		 */
> -		if (!err && !machine__is(machine, "x86_64"))
> +		if (!err && !machine__is(machine, EM_X86_64))
>  			machine->kernel_start = map__start(map);
>  	}
>  	return err;

If machine__is() erroneously returns true for non-x86_64 guests (as noted
above), could this leave machine->kernel_start at 1ULL << 63 and corrupt
kernel address resolution for cross-architecture guest analysis?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=7

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 16/18] perf env: Add helper to lazily compute the os_release
  2026-05-03  0:22                             ` [PATCH v9 16/18] perf env: Add helper to lazily compute the os_release Ian Rogers
@ 2026-05-03  1:00                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  1:00 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Lazy initialization in `perf_env__os_release` assumes a NULL `os_release` implies 'live mode', permanently polluting offline trace environments with the analysis host's kernel version.
--

commit d62418e881e036f2b1905f5ed81dd67b105e5fb8
Author: Ian Rogers <irogers@google.com>

perf env: Add helper to lazily compute the os_release

This commit introduces a lazy initialization helper for os_release. It
assumes that if os_release is uninitialized, the tool is running in live
mode, and populates it using uname().

[ ... ]

> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 29d5fe37528b..45dde40042b5 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -351,6 +351,27 @@ bool perf_arch_is_big_endian(const char *arch)
>  	return false;
>  }
>  
> +const char *perf_env__os_release(struct perf_env *env)
> +{
> +	struct utsname uts;
> +	int ret;
> +
> +	if (!env)
> +		return perf_version_string;
> +
> +	if (env->os_release)
> +		return env->os_release;
> +
> +	/*
> +	 * The os_release is being accessed but wasn't initialized from a data
> +	 * file, assume this is 'live' mode and use the release from uname. If
> +	 * uname or strdup fails then use the current perf tool version.
> +	 */
> +	ret = uname(&uts);
> +	env->os_release = strdup(ret < 0 ? perf_version_string : uts.release);

If an offline perf.data file was recorded without the HEADER_OSRELEASE feature
(for example, using --no-feature osrelease), won't env->os_release be NULL?

If so, does this permanently overwrite the target's environment metadata with
the analysis host's kernel version?

If this occurs, it seems vmlinux_path__init() will search for the analysis
host's kernel path instead of the target's. If the offline trace also lacks
build-ids, could perf silently load the host's kernel symbol table and apply
it to the target's profile, corrupting symbol resolution?

Similarly, could tools like perf data convert silently export the host's OS
release into the new trace data as if it belonged to the target system?

> +	return env->os_release ?: perf_version_string;
> +}
> +
>  int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[])
>  {
>  	int i;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=16

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 14/18] perf env: Refactor perf_env__arch_strerrno
  2026-05-03  0:22                             ` [PATCH v9 14/18] perf env: Refactor perf_env__arch_strerrno Ian Rogers
@ 2026-05-03  1:11                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  1:11 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Cross-architecture analysis regression in errno string decoding for global system summaries.
--

commit db5c373c1c19f2552fd69fd048980560d992a842
Author: Ian Rogers <irogers@google.com>

perf env: Refactor perf_env__arch_strerrno

This commit modifies perf_env__arch_strerrno to take an e_machine constant
instead of an architecture string and updates callers in builtin-trace.c.
It also adds a helper script to map Linux directory names to ELF machine
constants.

> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index e58c49d047a2..d278af18542f 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
[ ... ]
> @@ -4890,7 +4889,9 @@ static size_t syscall__dump_stats(struct trace *trace, int e_machine, FILE *fp,
>  
>  				for (e = 0; e < stats->max_errno; ++e) {
>  					if (stats->errnos[e] != 0)
> -						fprintf(fp, "\t\t\t\t%s: %d\n", perf_env__arch_strerrno(trace->host->env, e + 1), stats->errnos[e]);
> +						fprintf(fp, "\t\t\t\t%s: %d\n",
> +							perf_env__arch_strerrno(e_machine, e + 1),
> +							stats->errnos[e]);
>  				}
>  			}
>  			lines++;

How does this impact cross-architecture analysis for global system summaries?

Prior to this patch, syscall__dump_stats() used the target environment 
(trace->host->env) to decode errno strings.

By switching to the e_machine argument, if the caller for global system
summaries passes a hardcoded EM_HOST (or an equivalent host-specific
default), will cross-architecture global summaries now incorrectly decode
errnos using the host architecture's mappings instead of the target's?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=14

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 18/18] perf symbol: Lazily compute idle
  2026-05-03  0:22                             ` [PATCH v9 18/18] perf symbol: Lazily compute idle Ian Rogers
@ 2026-05-03  1:11                               ` sashiko-bot
  0 siblings, 0 replies; 106+ messages in thread
From: sashiko-bot @ 2026-05-03  1:11 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Unnecessary out-of-line function call `perf_env__e_machine` before the fast-path cache check in `symbol__is_idle` introduces measurable overhead on the `perf top` hot path.
--

commit 786c2d40f855657d57b68d17dcb788ab5ff93929
Author: Ian Rogers <irogers@google.com>

perf symbol: Lazily compute idle

This commit switches from an idle boolean to a helper function,
symbol__is_idle, to lazily compute whether a symbol is an idle function.
The computation now takes the machine's architecture and kernel version into
consideration.

Link: https://lore.kernel.org/lkml/20260219113850.354271-1-tmricht@linux.ibm.com/
Link: https://lore.kernel.org/lkml/20260323085255.98173-1-jameshongleiwang@126.com/

[ ... ]

> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 4702b8989354..2caa6b8b8609 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c

[ ... ]

> @@ -760,55 +747,90 @@ int modules__parse(const char *filename, void *arg,
>  	return err;
>  }
>  
> -bool symbol__is_idle(const struct symbol *sym)
> -{
> -	uint16_t flags = atomic_load(&sym->flags);
> -	uint16_t idle_val = (flags & SYMBOL_FLAG_IDLE_MASK) >> SYMBOL_FLAG_IDLE_SHIFT;
> -
> -	return idle_val == SYMBOL_IDLE__IDLE;
> -}
> -
>  /*
>   * These are symbols in the kernel image, so make sure that
>   * sym is from a kernel DSO.
>   */
> -static bool symbol__compute_is_idle(const char *name)
> +static int sym_name_cmp(const void *a, const void *b)
> +{
> +	const char *name = a;
> +	const char *const *sym = b;
> +
> +	return strcmp(name, *sym);
> +}
> +
> +bool symbol__is_idle(struct symbol *sym, const struct dso *dso, struct perf_env *env)
>  {

[ ... ]

> +	const char *name = sym->name;
> +	uint16_t e_machine = perf_env__e_machine(env, /*e_flags=*/NULL);

Does calling perf_env__e_machine() before checking the cached idle_val
introduce unnecessary overhead on the perf top hot path?

Delaying the e_machine initialization until after the cache check below
might help avoid this out-of-line function call when the idle state is
already known.

> +
> +	{
> +		uint16_t flags = atomic_load(&sym->flags);
> +		uint16_t idle_val = (flags & SYMBOL_FLAG_IDLE_MASK) >> SYMBOL_FLAG_IDLE_SHIFT;
> +
> +		if (idle_val != SYMBOL_IDLE__UNKNOWN)
> +			return idle_val == SYMBOL_IDLE__IDLE;
> +	}
> +
> +	if (!dso || dso__kernel(dso) == DSO_SPACE__USER) {
> +		symbol__set_idle(sym, /*idle=*/false);
> +		return false;
> +	}

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260503002248.1040791-1-irogers@google.com?part=18

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch
  2026-05-03  0:22                             ` [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
  2026-05-03  0:52                               ` sashiko-bot
@ 2026-05-04  1:35                               ` Namhyung Kim
  1 sibling, 0 replies; 106+ messages in thread
From: Namhyung Kim @ 2026-05-04  1:35 UTC (permalink / raw)
  To: Ian Rogers
  Cc: acme, tmricht, agordeev, gor, hca, jameshongleiwang, japo,
	linux-kernel, linux-perf-users, linux-s390, sumanthk

On Sat, May 02, 2026 at 05:22:31PM -0700, Ian Rogers wrote:
> Add a helper that lazily computes the e_machine and falls back to
> EM_HOST. Use the perf_env's arch to compute the e_machine if
> available. Use a binary search for some efficiency in this, but handle
> somewhat complex duplicate rules. Switch perf_env__arch to be derived
> the e_machine for consistency. This switches arch from being uname
> derived to matching that of the perf binary (via EM_HOST). Update
> session to use the helper, which may mean using EM_HOST when no
> threads are available. This also updates the perf data file header
> that gets the e_machine/e_flags from the session.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
[SNIP]
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index f30e48eb3fc3..f1ae61392cce 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -379,21 +379,28 @@ static int write_osrelease(struct feat_fd *ff,
>  	return do_write_string(ff, uts.release);
>  }
>  
> -static int write_arch(struct feat_fd *ff,
> -		      struct evlist *evlist __maybe_unused)
> +static int write_arch(struct feat_fd *ff, struct evlist *evlist)
>  {
>  	struct utsname uts;
> -	int ret;
> +	const char *arch = NULL;
>  
> -	ret = uname(&uts);
> -	if (ret < 0)
> -		return -1;
> +	if (evlist->session) {
> +		/* Force the computation in the perf_env of the e_machine of the threads. */
> +		perf_session__e_machine(evlist->session, /*e_flags=*/NULL);
> +		arch = perf_env__arch(perf_session__env(evlist->session));
> +	}
>  
> -	return do_write_string(ff, uts.machine);
> +	if (!arch) {
> +		int ret = uname(&uts);
> +
> +		if (ret < 0)
> +			return -1;
> +		arch = uts.machine;
> +	}
> +	return do_write_string(ff, arch);
>  }
>  
> -static int write_e_machine(struct feat_fd *ff,
> -			   struct evlist *evlist __maybe_unused)
> +static int write_e_machine(struct feat_fd *ff, struct evlist *evlist)
>  {
>  	/* e_machine expanded from 16 to 32-bits for alignment. */
>  	uint32_t e_flags;
> @@ -2684,10 +2691,18 @@ static int process_##__feat(struct feat_fd *ff, void *data __maybe_unused) \
>  FEAT_PROCESS_STR_FUN(hostname, hostname);
>  FEAT_PROCESS_STR_FUN(osrelease, os_release);
>  FEAT_PROCESS_STR_FUN(version, version);
> -FEAT_PROCESS_STR_FUN(arch, arch);
>  FEAT_PROCESS_STR_FUN(cpudesc, cpu_desc);
>  FEAT_PROCESS_STR_FUN(cpuid, cpuid);
>  
> +static int process_arch(struct feat_fd *ff, void *data __maybe_unused)
> +{
> +	free(ff->ph->env.arch);
> +	ff->ph->env.arch = do_read_string(ff);
> +	if (!ff->ph->env.arch)
> +		return -ENOMEM;
> +	return 0;
> +}

Isn't it same as FEAT_PROCESS_STR_FUN()?


> +
>  static int process_e_machine(struct feat_fd *ff, void *data __maybe_unused)
>  {
>  	int ret;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index fe0de2a0277f..3e64db2d27c2 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -3023,14 +3023,19 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
>  		return EM_HOST;
>  	}
>  
> +	/*
> +	 * Is the env caching an e_machine? If not we want to compute from the
> +	 * more accurate threads.
> +	 */
>  	env = perf_session__env(session);
> -	if (env && env->e_machine != EM_NONE) {
> -		if (e_flags)
> -			*e_flags = env->e_flags;
> -
> -		return env->e_machine;
> -	}
> +	if (env && env->e_machine != EM_NONE)
> +		return perf_env__e_machine(env, e_flags);
>  
> +	/*
> +	 * Compute from threads, note this is more accurate than
> +	 * perf_env__e_machine that falls back on EM_HOST and doesn't consider
> +	 * mixed 32-bit and 64-bit threads.
> +	 */

I'm curious if it's always better.  If EM_HOST is 64-bit and the first
thread in a session happens to be 32-bit.  Then resulting e_machine
would be 32- bit, right?  Is that what we want?

Thanks,
Namhyung


>  	machines__for_each_thread(&session->machines,
>  				  perf_session__e_machine_cb,
>  				  &args);
> @@ -3048,10 +3053,9 @@ uint16_t perf_session__e_machine(struct perf_session *session, uint32_t *e_flags
>  
>  	/*
>  	 * Couldn't determine from the perf_env or current set of
> -	 * threads. Default to the host.
> +	 * threads. Potentially use logic that uses the arch string otherwise
> +	 * default to the host. Don't cache in the perf_env in case later
> +	 * threads indicate a better ELF machine type.
>  	 */
> -	if (e_flags)
> -		*e_flags = EF_HOST;
> -
> -	return EM_HOST;
> +	return perf_env__e_machine_nocache(env, e_flags);
>  }
> -- 
> 2.54.0.545.g6539524ca2-goog
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2026-05-04  1:35 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 11:38 [PATCH v2] perf symbol: Remove psw_idle() from list of idle symbols Thomas Richter
2026-02-19 11:55 ` Jan Polensky
2026-02-23 21:46 ` Namhyung Kim
2026-02-23 23:14   ` Arnaldo Melo
2026-03-02 18:43   ` Arnaldo Carvalho de Melo
2026-03-02 19:44     ` Ian Rogers
2026-03-04 14:34       ` Arnaldo Carvalho de Melo
2026-03-02 23:43 ` [PATCH v1] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-03-24 17:14   ` Ian Rogers
2026-03-25  6:58     ` Namhyung Kim
2026-03-25 15:58       ` Ian Rogers
2026-03-25 16:18   ` [PATCH v2] " Ian Rogers
2026-03-26  7:20     ` Honglei Wang
2026-03-26 15:11       ` Ian Rogers
2026-03-26 17:45         ` [PATCH v3 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-03-26 17:45           ` [PATCH v3 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-03-26 17:45           ` [PATCH v3 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-03-27  6:56             ` Honglei Wang
2026-03-27  4:50           ` [PATCH v4 0/2] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-03-27  4:50             ` [PATCH v4 1/2] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-04-06  5:05               ` Namhyung Kim
2026-04-06 15:36                 ` Ian Rogers
2026-03-27  4:50             ` [PATCH v4 2/2] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-04-06  5:10               ` Namhyung Kim
2026-04-06 16:11                 ` Ian Rogers
2026-04-06 17:09                   ` [PATCH v5 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-04-06 17:09                     ` [PATCH v5 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-04-06 17:09                     ` [PATCH v5 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-04-06 17:09                     ` [PATCH v5 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-04-09 23:06                     ` [PATCH v6 0/3] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-04-09 23:06                       ` [PATCH v6 1/3] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-04-09 23:37                         ` sashiko-bot
2026-05-01 18:20                         ` [PATCH v7 0/4] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-05-01 18:20                           ` [PATCH v7 1/4] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-05-01 18:56                             ` sashiko-bot
2026-05-01 18:20                           ` [PATCH v7 2/4] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-05-01 19:20                             ` sashiko-bot
2026-05-01 18:20                           ` [PATCH v7 3/4] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
2026-05-01 19:42                             ` sashiko-bot
2026-05-01 18:20                           ` [PATCH v7 4/4] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
2026-05-01 20:13                             ` sashiko-bot
2026-05-02  6:59                         ` [PATCH v8 00/17] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 01/17] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-05-02  7:56                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 02/17] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 03/17] perf capstone: Determine architecture from e_machine Ian Rogers
2026-05-02  7:58                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 04/17] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
2026-05-02  7:55                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 05/17] perf machine: Use perf_env e_machine rather than arch Ian Rogers
2026-05-02  7:11                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 06/17] perf sample-raw: " Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 07/17] perf sort: " Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 08/17] perf symbol: Avoid use of machine__is Ian Rogers
2026-05-02  7:17                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 09/17] perf arch common: Use perf_env e_machine rather than arch Ian Rogers
2026-05-02  7:59                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 10/17] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 11/17] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
2026-05-02  7:44                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 12/17] perf lock-contention: " Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 13/17] perf env: Refactor perf_env__arch_strerrno Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 14/17] perf env: Remove unused perf_env__raw_arch Ian Rogers
2026-05-02  6:59                           ` [PATCH v8 15/17] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-05-02  7:53                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 16/17] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
2026-05-02  7:55                             ` sashiko-bot
2026-05-02  6:59                           ` [PATCH v8 17/17] perf symbol: Lazily compute idle and use a global lock for updates Ian Rogers
2026-05-03  0:22                           ` [PATCH v9 00/18] perf symbol/env: ELF machine clean up and lazy idle computation Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 01/18] perf env: Add perf_env__e_machine helper and use in perf_env__arch Ian Rogers
2026-05-03  0:52                               ` sashiko-bot
2026-05-04  1:35                               ` Namhyung Kim
2026-05-03  0:22                             ` [PATCH v9 02/18] perf tests topology: Switch env->arch use to env->e_machine Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 03/18] perf env, dso, thread: Add _endian variants for e_machine helpers Ian Rogers
2026-05-03  0:39                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 04/18] perf capstone: Determine architecture from e_machine Ian Rogers
2026-05-03  0:50                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 05/18] perf print_insn: Use e_machine for fallback IP length check Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 06/18] perf symbol: Avoid use of machine__is Ian Rogers
2026-05-03  0:51                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 07/18] perf machine: Use perf_env e_machine rather than arch Ian Rogers
2026-05-03  1:00                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 08/18] perf sample-raw: " Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 09/18] perf sort: " Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 10/18] perf arch common: " Ian Rogers
2026-05-03  0:38                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 11/18] perf header: In print_pmu_caps use perf_env e_machine Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 12/18] perf c2c: Use perf_env e_machine rather than arch Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 13/18] perf lock-contention: " Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 14/18] perf env: Refactor perf_env__arch_strerrno Ian Rogers
2026-05-03  1:11                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 15/18] perf env: Remove unused perf_env__raw_arch Ian Rogers
2026-05-03  0:22                             ` [PATCH v9 16/18] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-05-03  1:00                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 17/18] perf symbol: Add setters for bitfields sharing a byte to avoid concurrent update issues Ian Rogers
2026-05-03  0:59                               ` sashiko-bot
2026-05-03  0:22                             ` [PATCH v9 18/18] perf symbol: Lazily compute idle Ian Rogers
2026-05-03  1:11                               ` sashiko-bot
2026-04-09 23:06                       ` [PATCH v6 2/3] perf env: Add helper to lazily compute the os_release Ian Rogers
2026-04-09 23:50                         ` sashiko-bot
2026-04-09 23:06                       ` [PATCH v6 3/3] perf symbol: Lazily compute idle and use the perf_env Ian Rogers
2026-04-10  0:11                         ` sashiko-bot
2026-03-27  6:00           ` [PATCH v2] perf tests task-analyzer: Write test files to tmpdir Ian Rogers
2026-03-31  7:22             ` Namhyung Kim
2026-03-31 17:58               ` Ian Rogers
2026-04-01  3:41                 ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox