[PATCH v2 0/3] Generic weight struct, use env for sort key and header

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Generic weight struct, use env for sort key and header
@ 2025-05-21 13:54 Ian Rogers
  2025-05-21 13:54 ` [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 13:54 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users

The arch directory is a barrier to cross-platform development as files
and behaviors within it are inherently platform specific. Sample
parsing should be generic but the PERF_SAMPLE_WEIGHT_STRUCT handling
was only present if building for x86 or powerpc. The sort key and
headers should be specific to the session that is being executed and
not to the machine perf is being run upon. These patches clean this
and associated code up.

v2: Avoid changes to include/perf/perf_dlfilter.h as suggested by
    Adrian Hunter.

Ian Rogers (3):
  perf sample: Remove arch notion of sample parsing
  perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test
  perf sort: Use perf_env to set arch sort keys and header

 tools/perf/arch/powerpc/util/Build         |   1 -
 tools/perf/arch/powerpc/util/event.c       |  60 ----------
 tools/perf/arch/x86/include/arch-tests.h   |   1 -
 tools/perf/arch/x86/tests/Build            |   1 -
 tools/perf/arch/x86/tests/arch-tests.c     |   2 -
 tools/perf/arch/x86/tests/sample-parsing.c | 125 ---------------------
 tools/perf/arch/x86/util/event.c           |  46 --------
 tools/perf/builtin-annotate.c              |   2 +-
 tools/perf/builtin-c2c.c                   |  53 +++++----
 tools/perf/builtin-diff.c                  |   2 +-
 tools/perf/builtin-report.c                |   2 +-
 tools/perf/builtin-script.c                |   2 +-
 tools/perf/builtin-top.c                   |  16 +--
 tools/perf/tests/hists_cumulate.c          |   8 +-
 tools/perf/tests/hists_filter.c            |   8 +-
 tools/perf/tests/hists_link.c              |   8 +-
 tools/perf/tests/hists_output.c            |  10 +-
 tools/perf/tests/sample-parsing.c          |  14 +++
 tools/perf/util/dlfilter.c                 |   2 +-
 tools/perf/util/event.h                    |   5 -
 tools/perf/util/evsel.c                    |  17 ++-
 tools/perf/util/hist.c                     |   4 +-
 tools/perf/util/hist.h                     |   2 +-
 tools/perf/util/intel-tpebs.c              |   4 +-
 tools/perf/util/sample.h                   |   5 +-
 tools/perf/util/session.c                  |   2 +-
 tools/perf/util/sort.c                     |  67 +++++++----
 tools/perf/util/sort.h                     |   5 +-
 tools/perf/util/synthetic-events.c         |  10 +-
 29 files changed, 150 insertions(+), 334 deletions(-)
 delete mode 100644 tools/perf/arch/powerpc/util/event.c
 delete mode 100644 tools/perf/arch/x86/tests/sample-parsing.c

-- 
2.49.0.1112.g889b7c5bd8-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 13:54 [PATCH v2 0/3] Generic weight struct, use env for sort key and header Ian Rogers
@ 2025-05-21 13:54 ` Ian Rogers
  2025-05-21 15:42   ` Liang, Kan
  2025-05-21 13:54 ` [PATCH v2 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test Ian Rogers
  2025-05-21 13:54 ` [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 13:54 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users

By definition arch sample parsing and synthesis will inhibit certain
kinds of cross-platform record then analysis (report, script,
etc.). Remove arch_perf_parse_sample_weight and
arch_perf_synthesize_sample_weight replacing with a common
implementation. Combine perf_sample p_stage_cyc and retire_lat to
capture the differing uses regardless of compiled for architecture.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/powerpc/util/event.c       | 26 ---------------------
 tools/perf/arch/x86/tests/sample-parsing.c |  4 ++--
 tools/perf/arch/x86/util/event.c           | 27 ----------------------
 tools/perf/builtin-script.c                |  2 +-
 tools/perf/util/dlfilter.c                 |  2 +-
 tools/perf/util/event.h                    |  2 --
 tools/perf/util/evsel.c                    | 17 ++++++++++----
 tools/perf/util/hist.c                     |  4 ++--
 tools/perf/util/hist.h                     |  2 +-
 tools/perf/util/intel-tpebs.c              |  4 ++--
 tools/perf/util/sample.h                   |  5 +---
 tools/perf/util/session.c                  |  2 +-
 tools/perf/util/sort.c                     |  6 ++---
 tools/perf/util/synthetic-events.c         | 10 ++++++--
 14 files changed, 34 insertions(+), 79 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
index 77d8cc2b5691..024ac8b54c33 100644
--- a/tools/perf/arch/powerpc/util/event.c
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -11,32 +11,6 @@
 #include "../../../util/debug.h"
 #include "../../../util/sample.h"
 
-void arch_perf_parse_sample_weight(struct perf_sample *data,
-				   const __u64 *array, u64 type)
-{
-	union perf_sample_weight weight;
-
-	weight.full = *array;
-	if (type & PERF_SAMPLE_WEIGHT)
-		data->weight = weight.full;
-	else {
-		data->weight = weight.var1_dw;
-		data->ins_lat = weight.var2_w;
-		data->p_stage_cyc = weight.var3_w;
-	}
-}
-
-void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
-					__u64 *array, u64 type)
-{
-	*array = data->weight;
-
-	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
-		*array &= 0xffffffff;
-		*array |= ((u64)data->ins_lat << 32);
-	}
-}
-
 const char *arch_perf_header_entry(const char *se_header)
 {
 	if (!strcmp(se_header, "Local INSTR Latency"))
diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
index a061e8619267..95d8f7f1d2fb 100644
--- a/tools/perf/arch/x86/tests/sample-parsing.c
+++ b/tools/perf/arch/x86/tests/sample-parsing.c
@@ -29,7 +29,7 @@ static bool samples_same(const struct perf_sample *s1,
 {
 	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
 		COMP(ins_lat);
-		COMP(retire_lat);
+		COMP(p_stage_cyc_or_retire_lat);
 	}
 
 	return true;
@@ -50,7 +50,7 @@ static int do_test(u64 sample_type)
 	struct perf_sample sample = {
 		.weight		= 101,
 		.ins_lat        = 102,
-		.retire_lat     = 103,
+		.p_stage_cyc_or_retire_lat = 103,
 	};
 	struct perf_sample sample_out;
 	size_t i, sz, bufsz;
diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index a0400707180c..576c1c36046c 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -92,33 +92,6 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
 
 #endif
 
-void arch_perf_parse_sample_weight(struct perf_sample *data,
-				   const __u64 *array, u64 type)
-{
-	union perf_sample_weight weight;
-
-	weight.full = *array;
-	if (type & PERF_SAMPLE_WEIGHT)
-		data->weight = weight.full;
-	else {
-		data->weight = weight.var1_dw;
-		data->ins_lat = weight.var2_w;
-		data->retire_lat = weight.var3_w;
-	}
-}
-
-void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
-					__u64 *array, u64 type)
-{
-	*array = data->weight;
-
-	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
-		*array &= 0xffffffff;
-		*array |= ((u64)data->ins_lat << 32);
-		*array |= ((u64)data->retire_lat << 48);
-	}
-}
-
 const char *arch_perf_header_entry(const char *se_header)
 {
 	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6c3bf74dd78c..c02c435e0f0b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2251,7 +2251,7 @@ static void process_event(struct perf_script *script,
 		fprintf(fp, "%16" PRIu16, sample->ins_lat);
 
 	if (PRINT_FIELD(RETIRE_LAT))
-		fprintf(fp, "%16" PRIu16, sample->retire_lat);
+		fprintf(fp, "%16" PRIu16, sample->p_stage_cyc_or_retire_lat);
 
 	if (PRINT_FIELD(CGROUP)) {
 		const char *cgrp_name;
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index ddacef881af2..7e61ddfa66b8 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -526,7 +526,7 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 	ASSIGN(period);
 	ASSIGN(weight);
 	ASSIGN(ins_lat);
-	ASSIGN(p_stage_cyc);
+	d_sample.p_stage_cyc = sample->p_stage_cyc_or_retire_lat;
 	ASSIGN(transaction);
 	ASSIGN(insn_cnt);
 	ASSIGN(cyc_cnt);
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 664bf39567ce..119bce37f4fd 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -390,8 +390,6 @@ extern unsigned int proc_map_timeout;
 #define PAGE_SIZE_NAME_LEN	32
 char *get_page_size_name(u64 size, char *str);
 
-void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
-void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
 const char *arch_perf_header_entry(const char *se_header);
 int arch_support_sort_key(const char *sort_key);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d55482f094bf..097ab98bb81a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2846,11 +2846,18 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
 	return 0;
 }
 
-void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
-					  const __u64 *array,
-					  u64 type __maybe_unused)
+static void perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type)
 {
-	data->weight = *array;
+	union perf_sample_weight weight;
+
+	weight.full = *array;
+	if (type & PERF_SAMPLE_WEIGHT) {
+		data->weight = weight.full;
+	} else {
+		data->weight = weight.var1_dw;
+		data->ins_lat = weight.var2_w;
+		data->p_stage_cyc_or_retire_lat = weight.var3_w;
+	}
 }
 
 u64 evsel__bitfield_swap_branch_flags(u64 value)
@@ -3236,7 +3243,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 
 	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
 		OVERFLOW_CHECK_u64(array);
-		arch_perf_parse_sample_weight(data, array, type);
+		perf_parse_sample_weight(data, array, type);
 		array++;
 	}
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index afc6855327ab..ae9803dca0b1 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -829,7 +829,7 @@ __hists__add_entry(struct hists *hists,
 			.period	= sample->period,
 			.weight1 = sample->weight,
 			.weight2 = sample->ins_lat,
-			.weight3 = sample->p_stage_cyc,
+			.weight3 = sample->p_stage_cyc_or_retire_lat,
 			.latency = al->latency,
 		},
 		.parent = sym_parent,
@@ -846,7 +846,7 @@ __hists__add_entry(struct hists *hists,
 		.time = hist_time(sample->time),
 		.weight = sample->weight,
 		.ins_lat = sample->ins_lat,
-		.p_stage_cyc = sample->p_stage_cyc,
+		.p_stage_cyc_or_retire_lat = sample->p_stage_cyc_or_retire_lat,
 		.simd_flags = sample->simd_flags,
 	}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index c64254088fc7..67033bdabcf4 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -255,7 +255,7 @@ struct hist_entry {
 	u64			code_page_size;
 	u64			weight;
 	u64			ins_lat;
-	u64			p_stage_cyc;
+	u64			p_stage_cyc_or_retire_lat;
 	s32			socket;
 	s32			cpu;
 	int			parallelism;
diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
index 4ad4bc118ea5..ec2f3ecf1e1c 100644
--- a/tools/perf/util/intel-tpebs.c
+++ b/tools/perf/util/intel-tpebs.c
@@ -202,8 +202,8 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
 	 * latency value will be used. Save the number of samples and the sum of
 	 * retire latency value for each event.
 	 */
-	t->last = sample->retire_lat;
-	update_stats(&t->stats, sample->retire_lat);
+	t->last = sample->p_stage_cyc_or_retire_lat;
+	update_stats(&t->stats, sample->p_stage_cyc_or_retire_lat);
 	mutex_unlock(tpebs_mtx_get());
 	return 0;
 }
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index 0e96240052e9..3330d18fb5fd 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -104,10 +104,7 @@ struct perf_sample {
 	u8  cpumode;
 	u16 misc;
 	u16 ins_lat;
-	union {
-		u16 p_stage_cyc;
-		u16 retire_lat;
-	};
+	u16 p_stage_cyc_or_retire_lat;
 	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
 	char insn[MAX_INSN];
 	void *raw_data;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index a320672c264e..451bc24ccfba 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1094,7 +1094,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
 		printf("... weight: %" PRIu64 "", sample->weight);
 			if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
 				printf(",0x%"PRIx16"", sample->ins_lat);
-				printf(",0x%"PRIx16"", sample->p_stage_cyc);
+				printf(",0x%"PRIx16"", sample->p_stage_cyc_or_retire_lat);
 			}
 		printf("\n");
 	}
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 45e654653960..dda4ef0b5a73 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1879,21 +1879,21 @@ struct sort_entry sort_global_ins_lat = {
 static int64_t
 sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
 {
-	return left->p_stage_cyc - right->p_stage_cyc;
+	return left->p_stage_cyc_or_retire_lat - right->p_stage_cyc_or_retire_lat;
 }
 
 static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
 					size_t size, unsigned int width)
 {
 	return repsep_snprintf(bf, size, "%-*u", width,
-			he->p_stage_cyc * he->stat.nr_events);
+			he->p_stage_cyc_or_retire_lat * he->stat.nr_events);
 }
 
 
 static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
 					size_t size, unsigned int width)
 {
-	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
+	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc_or_retire_lat);
 }
 
 struct sort_entry sort_local_p_stage_cyc = {
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 2fc4d0537840..449a41900fc4 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1567,10 +1567,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	return result;
 }
 
-void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
+static void perf_synthesize_sample_weight(const struct perf_sample *data,
 					       __u64 *array, u64 type __maybe_unused)
 {
 	*array = data->weight;
+
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
+		*array &= 0xffffffff;
+		*array |= ((u64)data->ins_lat << 32);
+		*array |= ((u64)data->p_stage_cyc_or_retire_lat << 48);
+	}
 }
 
 static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
@@ -1730,7 +1736,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 	}
 
 	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
-		arch_perf_synthesize_sample_weight(sample, array, type);
+		perf_synthesize_sample_weight(sample, array, type);
 		array++;
 	}
 
-- 
2.49.0.1112.g889b7c5bd8-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test
  2025-05-21 13:54 [PATCH v2 0/3] Generic weight struct, use env for sort key and header Ian Rogers
  2025-05-21 13:54 ` [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
@ 2025-05-21 13:54 ` Ian Rogers
  2025-05-21 13:54 ` [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
  2 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 13:54 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users

test__x86_sample_parsing is identical to test__sample_parsing except
it explicitly tested PERF_SAMPLE_WEIGHT_STRUCT. Now the parsing code
is common move the PERF_SAMPLE_WEIGHT_STRUCT to the common sample
parsing test and remove the x86 version.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/include/arch-tests.h   |   1 -
 tools/perf/arch/x86/tests/Build            |   1 -
 tools/perf/arch/x86/tests/arch-tests.c     |   2 -
 tools/perf/arch/x86/tests/sample-parsing.c | 125 ---------------------
 tools/perf/tests/sample-parsing.c          |  14 +++
 5 files changed, 14 insertions(+), 129 deletions(-)
 delete mode 100644 tools/perf/arch/x86/tests/sample-parsing.c

diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
index 4fd425157d7d..957934417b26 100644
--- a/tools/perf/arch/x86/include/arch-tests.h
+++ b/tools/perf/arch/x86/include/arch-tests.h
@@ -12,7 +12,6 @@ int test__insn_x86(struct test_suite *test, int subtest);
 int test__intel_pt_pkt_decoder(struct test_suite *test, int subtest);
 int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
 int test__bp_modify(struct test_suite *test, int subtest);
-int test__x86_sample_parsing(struct test_suite *test, int subtest);
 int test__amd_ibs_via_core_pmu(struct test_suite *test, int subtest);
 int test__amd_ibs_period(struct test_suite *test, int subtest);
 int test__hybrid(struct test_suite *test, int subtest);
diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
index 5e00cbfd2d56..4c27b85b960a 100644
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
@@ -2,7 +2,6 @@ perf-test-$(CONFIG_DWARF_UNWIND) += regs_load.o
 perf-test-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
 
 perf-test-y += arch-tests.o
-perf-test-y += sample-parsing.o
 perf-test-y += hybrid.o
 perf-test-$(CONFIG_AUXTRACE) += intel-pt-test.o
 ifeq ($(CONFIG_EXTRA_TESTS),y)
diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
index bfee2432515b..1023c83684ce 100644
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
@@ -23,7 +23,6 @@ struct test_suite suite__intel_pt = {
 #if defined(__x86_64__)
 DEFINE_SUITE("x86 bp modify", bp_modify);
 #endif
-DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
 DEFINE_SUITE("AMD IBS via core pmu", amd_ibs_via_core_pmu);
 DEFINE_SUITE_EXCLUSIVE("AMD IBS sample period", amd_ibs_period);
 static struct test_case hybrid_tests[] = {
@@ -49,7 +48,6 @@ struct test_suite *arch_tests[] = {
 #if defined(__x86_64__)
 	&suite__bp_modify,
 #endif
-	&suite__x86_sample_parsing,
 	&suite__amd_ibs_via_core_pmu,
 	&suite__amd_ibs_period,
 	&suite__hybrid,
diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
deleted file mode 100644
index 95d8f7f1d2fb..000000000000
--- a/tools/perf/arch/x86/tests/sample-parsing.c
+++ /dev/null
@@ -1,125 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-#include <stdbool.h>
-#include <inttypes.h>
-#include <stdlib.h>
-#include <string.h>
-#include <linux/bitops.h>
-#include <linux/kernel.h>
-#include <linux/types.h>
-
-#include "event.h"
-#include "evsel.h"
-#include "debug.h"
-#include "util/sample.h"
-#include "util/synthetic-events.h"
-
-#include "tests/tests.h"
-#include "arch-tests.h"
-
-#define COMP(m) do {					\
-	if (s1->m != s2->m) {				\
-		pr_debug("Samples differ at '"#m"'\n");	\
-		return false;				\
-	}						\
-} while (0)
-
-static bool samples_same(const struct perf_sample *s1,
-			 const struct perf_sample *s2,
-			 u64 type)
-{
-	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
-		COMP(ins_lat);
-		COMP(p_stage_cyc_or_retire_lat);
-	}
-
-	return true;
-}
-
-static int do_test(u64 sample_type)
-{
-	struct evsel evsel = {
-		.needs_swap = false,
-		.core = {
-			. attr = {
-				.sample_type = sample_type,
-				.read_format = 0,
-			},
-		},
-	};
-	union perf_event *event;
-	struct perf_sample sample = {
-		.weight		= 101,
-		.ins_lat        = 102,
-		.p_stage_cyc_or_retire_lat = 103,
-	};
-	struct perf_sample sample_out;
-	size_t i, sz, bufsz;
-	int err, ret = -1;
-
-	sz = perf_event__sample_event_size(&sample, sample_type, 0);
-	bufsz = sz + 4096; /* Add a bit for overrun checking */
-	event = malloc(bufsz);
-	if (!event) {
-		pr_debug("malloc failed\n");
-		return -1;
-	}
-
-	memset(event, 0xff, bufsz);
-	event->header.type = PERF_RECORD_SAMPLE;
-	event->header.misc = 0;
-	event->header.size = sz;
-
-	err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
-	if (err) {
-		pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
-			 "perf_event__synthesize_sample", sample_type, err);
-		goto out_free;
-	}
-
-	/* The data does not contain 0xff so we use that to check the size */
-	for (i = bufsz; i > 0; i--) {
-		if (*(i - 1 + (u8 *)event) != 0xff)
-			break;
-	}
-	if (i != sz) {
-		pr_debug("Event size mismatch: actual %zu vs expected %zu\n",
-			 i, sz);
-		goto out_free;
-	}
-
-	evsel.sample_size = __evsel__sample_size(sample_type);
-
-	err = evsel__parse_sample(&evsel, event, &sample_out);
-	if (err) {
-		pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
-			 "evsel__parse_sample", sample_type, err);
-		goto out_free;
-	}
-
-	if (!samples_same(&sample, &sample_out, sample_type)) {
-		pr_debug("parsing failed for sample_type %#"PRIx64"\n",
-			 sample_type);
-		goto out_free;
-	}
-
-	ret = 0;
-out_free:
-	free(event);
-
-	return ret;
-}
-
-/**
- * test__x86_sample_parsing - test X86 specific sample parsing
- *
- * This function implements a test that synthesizes a sample event, parses it
- * and then checks that the parsed sample matches the original sample. If the
- * test passes %0 is returned, otherwise %-1 is returned.
- *
- * For now, the PERF_SAMPLE_WEIGHT_STRUCT is the only X86 specific sample type.
- * The test only checks the PERF_SAMPLE_WEIGHT_STRUCT type.
- */
-int test__x86_sample_parsing(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
-{
-	return do_test(PERF_SAMPLE_WEIGHT_STRUCT);
-}
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 72411580f869..ad57e34bda19 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -152,6 +152,12 @@ static bool samples_same(struct perf_sample *s1,
 	if (type & PERF_SAMPLE_WEIGHT)
 		COMP(weight);
 
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
+		COMP(weight);
+		COMP(ins_lat);
+		COMP(p_stage_cyc_or_retire_lat);
+	}
+
 	if (type & PERF_SAMPLE_DATA_SRC)
 		COMP(data_src);
 
@@ -269,6 +275,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		.cgroup		= 114,
 		.data_page_size = 115,
 		.code_page_size = 116,
+		.ins_lat	= 117,
+		.p_stage_cyc_or_retire_lat = 118,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -439,6 +447,12 @@ static int test__sample_parsing(struct test_suite *test __maybe_unused, int subt
 		if (err)
 			return err;
 	}
+	sample_type = (PERF_SAMPLE_MAX - 1) & ~PERF_SAMPLE_WEIGHT_STRUCT;
+	for (i = 0; i < ARRAY_SIZE(rf); i++) {
+		err = do_test(sample_type, sample_regs, rf[i]);
+		if (err)
+			return err;
+	}
 
 	return 0;
 }
-- 
2.49.0.1112.g889b7c5bd8-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 13:54 [PATCH v2 0/3] Generic weight struct, use env for sort key and header Ian Rogers
  2025-05-21 13:54 ` [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
  2025-05-21 13:54 ` [PATCH v2 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test Ian Rogers
@ 2025-05-21 13:54 ` Ian Rogers
  2025-05-21 15:50   ` Liang, Kan
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 13:54 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users

Previously arch_support_sort_key and arch_perf_header_entry used a
weak symbol to compile as appropriate for x86 and powerpc. A
limitation to this is that the handling of a data file could vary in
cross-platform development. Change to using the perf_env of the
current session to determine the architecture kind and set the sort
key and header entries as appropriate.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/powerpc/util/Build   |  1 -
 tools/perf/arch/powerpc/util/event.c | 34 ----------------
 tools/perf/arch/x86/util/event.c     | 19 ---------
 tools/perf/builtin-annotate.c        |  2 +-
 tools/perf/builtin-c2c.c             | 53 +++++++++++++-----------
 tools/perf/builtin-diff.c            |  2 +-
 tools/perf/builtin-report.c          |  2 +-
 tools/perf/builtin-top.c             | 16 ++++----
 tools/perf/tests/hists_cumulate.c    |  8 ++--
 tools/perf/tests/hists_filter.c      |  8 ++--
 tools/perf/tests/hists_link.c        |  8 ++--
 tools/perf/tests/hists_output.c      | 10 ++---
 tools/perf/util/event.h              |  3 --
 tools/perf/util/sort.c               | 61 ++++++++++++++++++++--------
 tools/perf/util/sort.h               |  5 ++-
 15 files changed, 104 insertions(+), 128 deletions(-)
 delete mode 100644 tools/perf/arch/powerpc/util/event.c

diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
index ed82715080f9..fdd6a77a3432 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -5,7 +5,6 @@ perf-util-y += mem-events.o
 perf-util-y += pmu.o
 perf-util-y += sym-handling.o
 perf-util-y += evsel.o
-perf-util-y += event.o
 
 perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
 
diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
deleted file mode 100644
index 024ac8b54c33..000000000000
--- a/tools/perf/arch/powerpc/util/event.c
+++ /dev/null
@@ -1,34 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/zalloc.h>
-
-#include "../../../util/event.h"
-#include "../../../util/synthetic-events.h"
-#include "../../../util/machine.h"
-#include "../../../util/tool.h"
-#include "../../../util/map.h"
-#include "../../../util/debug.h"
-#include "../../../util/sample.h"
-
-const char *arch_perf_header_entry(const char *se_header)
-{
-	if (!strcmp(se_header, "Local INSTR Latency"))
-		return "Finish Cyc";
-	else if (!strcmp(se_header, "INSTR Latency"))
-		return "Global Finish_cyc";
-	else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
-		return "Dispatch Cyc";
-	else if (!strcmp(se_header, "Pipeline Stage Cycle"))
-		return "Global Dispatch_cyc";
-	return se_header;
-}
-
-int arch_support_sort_key(const char *sort_key)
-{
-	if (!strcmp(sort_key, "p_stage_cyc"))
-		return 1;
-	if (!strcmp(sort_key, "local_p_stage_cyc"))
-		return 1;
-	return 0;
-}
diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index 576c1c36046c..3cd384317739 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -91,22 +91,3 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
 }
 
 #endif
-
-const char *arch_perf_header_entry(const char *se_header)
-{
-	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
-		return "Local Retire Latency";
-	else if (!strcmp(se_header, "Pipeline Stage Cycle"))
-		return "Retire Latency";
-
-	return se_header;
-}
-
-int arch_support_sort_key(const char *sort_key)
-{
-	if (!strcmp(sort_key, "p_stage_cyc"))
-		return 1;
-	if (!strcmp(sort_key, "local_p_stage_cyc"))
-		return 1;
-	return 0;
-}
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 9833c2c82a2f..a2d41614ef5e 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -947,7 +947,7 @@ int cmd_annotate(int argc, const char **argv)
 			annotate_opts.show_br_cntr = true;
 	}
 
-	if (setup_sorting(NULL) < 0)
+	if (setup_sorting(/*evlist=*/NULL, &annotate.session->header.env) < 0)
 		usage_with_options(annotate_usage, options);
 
 	ret = __cmd_annotate(&annotate);
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index e2e257bcc461..324368aabfa2 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -195,12 +195,14 @@ static struct hist_entry_ops c2c_entry_ops = {
 
 static int c2c_hists__init(struct c2c_hists *hists,
 			   const char *sort,
-			   int nr_header_lines);
+			   int nr_header_lines,
+			   struct perf_env *env);
 
 static struct c2c_hists*
 he__get_c2c_hists(struct hist_entry *he,
 		  const char *sort,
-		  int nr_header_lines)
+		  int nr_header_lines,
+		  struct perf_env *env)
 {
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *hists;
@@ -214,7 +216,7 @@ he__get_c2c_hists(struct hist_entry *he,
 	if (!hists)
 		return NULL;
 
-	ret = c2c_hists__init(hists, sort, nr_header_lines);
+	ret = c2c_hists__init(hists, sort, nr_header_lines, env);
 	if (ret) {
 		free(hists);
 		return NULL;
@@ -350,7 +352,7 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
 
 		mi = mi_dup;
 
-		c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2);
+		c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2, machine->env);
 		if (!c2c_hists)
 			goto free_mi;
 
@@ -1966,7 +1968,8 @@ static struct c2c_fmt *get_format(const char *name)
 	return c2c_fmt;
 }
 
-static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
+static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name,
+				  struct perf_env *env __maybe_unused)
 {
 	struct c2c_fmt *c2c_fmt = get_format(name);
 	int level = 0;
@@ -1980,14 +1983,14 @@ static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
 	return 0;
 }
 
-static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
+static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name, struct perf_env *env)
 {
 	struct c2c_fmt *c2c_fmt = get_format(name);
 	struct c2c_dimension *dim;
 
 	if (!c2c_fmt) {
 		reset_dimensions();
-		return sort_dimension__add(hpp_list, name, NULL, 0);
+		return sort_dimension__add(hpp_list, name, /*evlist=*/NULL, env, /*level=*/0);
 	}
 
 	dim = c2c_fmt->dim;
@@ -2008,7 +2011,7 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
 										\
 		for (tok = strtok_r((char *)_list, ", ", &tmp);			\
 				tok; tok = strtok_r(NULL, ", ", &tmp)) {	\
-			ret = _fn(hpp_list, tok);				\
+			ret = _fn(hpp_list, tok, env);				\
 			if (ret == -EINVAL) {					\
 				pr_err("Invalid --fields key: `%s'", tok);	\
 				break;						\
@@ -2021,7 +2024,8 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
 
 static int hpp_list__parse(struct perf_hpp_list *hpp_list,
 			   const char *output_,
-			   const char *sort_)
+			   const char *sort_,
+			   struct perf_env *env)
 {
 	char *output = output_ ? strdup(output_) : NULL;
 	char *sort   = sort_   ? strdup(sort_) : NULL;
@@ -2052,7 +2056,8 @@ static int hpp_list__parse(struct perf_hpp_list *hpp_list,
 
 static int c2c_hists__init(struct c2c_hists *hists,
 			   const char *sort,
-			   int nr_header_lines)
+			   int nr_header_lines,
+			   struct perf_env *env)
 {
 	__hists__init(&hists->hists, &hists->list);
 
@@ -2066,15 +2071,16 @@ static int c2c_hists__init(struct c2c_hists *hists,
 	/* Overload number of header lines.*/
 	hists->list.nr_header_lines = nr_header_lines;
 
-	return hpp_list__parse(&hists->list, NULL, sort);
+	return hpp_list__parse(&hists->list, /*output=*/NULL, sort, env);
 }
 
 static int c2c_hists__reinit(struct c2c_hists *c2c_hists,
 			     const char *output,
-			     const char *sort)
+			     const char *sort,
+			     struct perf_env *env)
 {
 	perf_hpp__reset_output_field(&c2c_hists->list);
-	return hpp_list__parse(&c2c_hists->list, output, sort);
+	return hpp_list__parse(&c2c_hists->list, output, sort, env);
 }
 
 #define DISPLAY_LINE_LIMIT  0.001
@@ -2207,8 +2213,9 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
 	return 0;
 }
 
-static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_cl_cb(struct hist_entry *he, void *arg)
 {
+	struct perf_env *env = arg;
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *c2c_hists;
 	bool display = he__display(he, &c2c.shared_clines_stats);
@@ -2222,7 +2229,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 		c2c_he->cacheline_idx = idx++;
 		calc_width(c2c_he);
 
-		c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort);
+		c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort, env);
 
 		hists__collapse_resort(&c2c_hists->hists, NULL);
 		hists__output_resort_cb(&c2c_hists->hists, NULL, filter_cb);
@@ -2333,7 +2340,7 @@ static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 	return 0;
 }
 
-static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
+static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb, void *arg)
 {
 	struct rb_node *next = rb_first_cached(&hists->entries);
 	int ret = 0;
@@ -2342,7 +2349,7 @@ static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
 		struct hist_entry *he;
 
 		he = rb_entry(next, struct hist_entry, rb_node);
-		ret = cb(he, NULL);
+		ret = cb(he, arg);
 		if (ret)
 			break;
 		next = rb_next(&he->rb_node);
@@ -2448,7 +2455,7 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
 	hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
 }
 
-static void print_pareto(FILE *out)
+static void print_pareto(FILE *out, struct perf_env *env)
 {
 	struct perf_hpp_list hpp_list;
 	struct rb_node *nd;
@@ -2473,7 +2480,7 @@ static void print_pareto(FILE *out)
 			    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
-	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
+	ret = hpp_list__parse(&hpp_list, cl_output, /*evlist=*/NULL, env);
 
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
@@ -2538,7 +2545,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
 	fprintf(out, "=================================================\n");
 	fprintf(out, "#\n");
 
-	print_pareto(out);
+	print_pareto(out, &session->header.env);
 }
 
 #ifdef HAVE_SLANG_SUPPORT
@@ -3095,7 +3102,7 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out_session;
 	}
 
-	err = c2c_hists__init(&c2c.hists, "dcacheline", 2);
+	err = c2c_hists__init(&c2c.hists, "dcacheline", 2, &session->header.env);
 	if (err) {
 		pr_debug("Failed to initialize hists\n");
 		goto out_session;
@@ -3179,13 +3186,13 @@ static int perf_c2c__report(int argc, const char **argv)
 	else if (c2c.display == DISPLAY_SNP_PEER)
 		sort_str = "tot_peer";
 
-	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
+	c2c_hists__reinit(&c2c.hists, output_str, sort_str, &session->header.env);
 
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
 	hists__collapse_resort(&c2c.hists.hists, NULL);
 	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
-	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
+	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb, &session->header.env);
 
 	ui_progress__finish();
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index ae490d58af92..f6735cf02329 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -2003,7 +2003,7 @@ int cmd_diff(int argc, const char **argv)
 		sort__mode = SORT_MODE__DIFF;
 	}
 
-	if (setup_sorting(NULL) < 0)
+	if (setup_sorting(/*evlist=*/NULL, &data__files[0].session->header.env) < 0)
 		usage_with_options(diff_usage, options);
 
 	setup_pager();
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f0299c7ee025..8be6e7c3bd22 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1776,7 +1776,7 @@ int cmd_report(int argc, const char **argv)
 	}
 
 	if ((last_key != K_SWITCH_INPUT_DATA && last_key != K_RELOAD) &&
-	    (setup_sorting(session->evlist) < 0)) {
+	    (setup_sorting(session->evlist, &session->header.env) < 0)) {
 		if (sort_order)
 			parse_options_usage(report_usage, options, "s", 1);
 		if (field_order)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 7b6cde87d2af..13ef0d188a96 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
 
 	setup_browser(false);
 
-	if (setup_sorting(top.evlist) < 0) {
+	top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
+	if (IS_ERR(top.session)) {
+		status = PTR_ERR(top.session);
+		top.session = NULL;
+		goto out_delete_evlist;
+	}
+
+	if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
 		if (sort_order)
 			parse_options_usage(top_usage, options, "s", 1);
 		if (field_order)
@@ -1820,13 +1827,6 @@ int cmd_top(int argc, const char **argv)
 		signal(SIGWINCH, winch_sig);
 	}
 
-	top.session = perf_session__new(NULL, NULL);
-	if (IS_ERR(top.session)) {
-		status = PTR_ERR(top.session);
-		top.session = NULL;
-		goto out_delete_evlist;
-	}
-
 	if (!evlist__needs_bpf_sb_event(top.evlist))
 		top.record_opts.no_bpf_event = true;
 
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 1e0f5a310fd5..3eb9ef8d7ec6 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -295,7 +295,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = false;
 	evsel__reset_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 	callchain_register_param(&callchain_param);
 
 	err = add_hist_entries(hists, machine);
@@ -442,7 +442,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = false;
 	evsel__set_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 	callchain_register_param(&callchain_param);
 
 	err = add_hist_entries(hists, machine);
@@ -500,7 +500,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = true;
 	evsel__reset_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 	callchain_register_param(&callchain_param);
 
 	err = add_hist_entries(hists, machine);
@@ -684,7 +684,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = true;
 	evsel__set_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	callchain_param = callchain_param_default;
 	callchain_register_param(&callchain_param);
diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
index 4b2e4f2fbe48..1cebd20cc91c 100644
--- a/tools/perf/tests/hists_filter.c
+++ b/tools/perf/tests/hists_filter.c
@@ -131,10 +131,6 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
 		goto out;
 	err = TEST_FAIL;
 
-	/* default sort order (comm,dso,sym) will be used */
-	if (setup_sorting(NULL) < 0)
-		goto out;
-
 	machines__init(&machines);
 
 	/* setup threads/dso/map/symbols also */
@@ -145,6 +141,10 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
 	if (verbose > 1)
 		machine__fprintf(machine, stderr);
 
+	/* default sort order (comm,dso,sym) will be used */
+	if (setup_sorting(evlist, machine->env) < 0)
+		goto out;
+
 	/* process sample events */
 	err = add_hist_entries(evlist, machine);
 	if (err < 0)
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 5b6f1e883466..996f5f0b3bd1 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -303,10 +303,6 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
 		goto out;
 
 	err = TEST_FAIL;
-	/* default sort order (comm,dso,sym) will be used */
-	if (setup_sorting(NULL) < 0)
-		goto out;
-
 	machines__init(&machines);
 
 	/* setup threads/dso/map/symbols also */
@@ -317,6 +313,10 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
 	if (verbose > 1)
 		machine__fprintf(machine, stderr);
 
+	/* default sort order (comm,dso,sym) will be used */
+	if (setup_sorting(evlist, machine->env) < 0)
+		goto out;
+
 	/* process sample events */
 	err = add_hist_entries(evlist, machine);
 	if (err < 0)
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index 33b5cc8352a7..ee5ec8bda60e 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -146,7 +146,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
 	field_order = NULL;
 	sort_order = NULL; /* equivalent to sort_order = "comm,dso,sym" */
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -248,7 +248,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
 	field_order = "overhead,cpu";
 	sort_order = "pid";
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -304,7 +304,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
 	field_order = "comm,overhead,dso";
 	sort_order = NULL;
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -378,7 +378,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
 	field_order = "dso,sym,comm,overhead,dso";
 	sort_order = "sym";
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -480,7 +480,7 @@ static int test5(struct evsel *evsel, struct machine *machine)
 	field_order = "cpu,pid,comm,dso,sym";
 	sort_order = "dso,pid";
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 119bce37f4fd..3b97a31736c5 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -390,9 +390,6 @@ extern unsigned int proc_map_timeout;
 #define PAGE_SIZE_NAME_LEN	32
 char *get_page_size_name(u64 size, char *str);
 
-const char *arch_perf_header_entry(const char *se_header);
-int arch_support_sort_key(const char *sort_key);
-
 static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
 {
 	return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index dda4ef0b5a73..ccab10cd24a5 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2526,19 +2526,44 @@ struct sort_dimension {
 	int			taken;
 };
 
-int __weak arch_support_sort_key(const char *sort_key __maybe_unused)
+static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 {
+	const char *arch = perf_env__arch(env);
+
+	if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
+		if (!strcmp(sort_key, "p_stage_cyc"))
+			return 1;
+		if (!strcmp(sort_key, "local_p_stage_cyc"))
+			return 1;
+	}
 	return 0;
 }
 
-const char * __weak arch_perf_header_entry(const char *se_header)
-{
+static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
+{
+	const char *arch = perf_env__arch(env);
+
+	if (!strcmp("x86", arch)) {
+		if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
+			return "Local Retire Latency";
+		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
+			return "Retire Latency";
+	} else if (!strcmp("powerpc", arch)) {
+		if (!strcmp(se_header, "Local INSTR Latency"))
+			return "Finish Cyc";
+		else if (!strcmp(se_header, "INSTR Latency"))
+			return "Global Finish_cyc";
+		else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
+			return "Dispatch Cyc";
+		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
+			return "Global Dispatch_cyc";
+	}
 	return se_header;
 }
 
-static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
+static void sort_dimension_add_dynamic_header(struct sort_dimension *sd, struct perf_env *env)
 {
-	sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header);
+	sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header, env);
 }
 
 #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
@@ -3590,7 +3615,7 @@ int hpp_dimension__add_output(unsigned col, bool implicit)
 }
 
 int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
-			struct evlist *evlist,
+			struct evlist *evlist, struct perf_env *env,
 			int level)
 {
 	unsigned int i, j;
@@ -3603,7 +3628,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 	 */
 	for (j = 0; j < ARRAY_SIZE(arch_specific_sort_keys); j++) {
 		if (!strcmp(arch_specific_sort_keys[j], tok) &&
-				!arch_support_sort_key(tok)) {
+		    !arch_support_sort_key(tok, env)) {
 			return 0;
 		}
 	}
@@ -3616,7 +3641,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 
 		for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
 			if (sd->name && !strcmp(dynamic_headers[j], sd->name))
-				sort_dimension_add_dynamic_header(sd);
+				sort_dimension_add_dynamic_header(sd, env);
 		}
 
 		if (sd->entry == &sort_parent && parent_pattern) {
@@ -3712,13 +3737,13 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 }
 
 /* This should match with sort_dimension__add() above */
-static bool is_hpp_sort_key(const char *key)
+static bool is_hpp_sort_key(const char *key, struct perf_env *env)
 {
 	unsigned i;
 
 	for (i = 0; i < ARRAY_SIZE(arch_specific_sort_keys); i++) {
 		if (!strcmp(arch_specific_sort_keys[i], key) &&
-		    !arch_support_sort_key(key)) {
+		    !arch_support_sort_key(key, env)) {
 			return false;
 		}
 	}
@@ -3740,7 +3765,7 @@ static bool is_hpp_sort_key(const char *key)
 }
 
 static int setup_sort_list(struct perf_hpp_list *list, char *str,
-			   struct evlist *evlist)
+			   struct evlist *evlist, struct perf_env *env)
 {
 	char *tmp, *tok;
 	int ret = 0;
@@ -3769,7 +3794,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
 		}
 
 		if (*tok) {
-			if (is_hpp_sort_key(tok)) {
+			if (is_hpp_sort_key(tok, env)) {
 				/* keep output (hpp) sort keys in the same level */
 				if (prev_was_hpp) {
 					bool next_same = (level == next_level);
@@ -3782,7 +3807,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
 				prev_was_hpp = false;
 			}
 
-			ret = sort_dimension__add(list, tok, evlist, level);
+			ret = sort_dimension__add(list, tok, evlist, env, level);
 			if (ret == -EINVAL) {
 				if (!cacheline_size() && !strncasecmp(tok, "dcacheline", strlen(tok)))
 					ui__error("The \"dcacheline\" --sort key needs to know the cacheline size and it couldn't be determined on this system");
@@ -3911,7 +3936,7 @@ static char *setup_overhead(char *keys)
 	return keys;
 }
 
-static int __setup_sorting(struct evlist *evlist)
+static int __setup_sorting(struct evlist *evlist, struct perf_env *env)
 {
 	char *str;
 	const char *sort_keys;
@@ -3951,7 +3976,7 @@ static int __setup_sorting(struct evlist *evlist)
 		}
 	}
 
-	ret = setup_sort_list(&perf_hpp_list, str, evlist);
+	ret = setup_sort_list(&perf_hpp_list, str, evlist, env);
 
 	free(str);
 	return ret;
@@ -4187,16 +4212,16 @@ static int __setup_output_field(void)
 	return ret;
 }
 
-int setup_sorting(struct evlist *evlist)
+int setup_sorting(struct evlist *evlist, struct perf_env *env)
 {
 	int err;
 
-	err = __setup_sorting(evlist);
+	err = __setup_sorting(evlist, env);
 	if (err < 0)
 		return err;
 
 	if (parent_pattern != default_parent_pattern) {
-		err = sort_dimension__add(&perf_hpp_list, "parent", evlist, -1);
+		err = sort_dimension__add(&perf_hpp_list, "parent", evlist, env, -1);
 		if (err < 0)
 			return err;
 	}
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index a742ab7f3c67..d7787958e06b 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -6,6 +6,7 @@
 #include "hist.h"
 
 struct option;
+struct perf_env;
 
 extern regex_t parent_regex;
 extern const char *sort_order;
@@ -130,7 +131,7 @@ extern struct sort_entry sort_thread;
 
 struct evlist;
 struct tep_handle;
-int setup_sorting(struct evlist *evlist);
+int setup_sorting(struct evlist *evlist, struct perf_env *env);
 int setup_output_field(void);
 void reset_output_field(void);
 void sort__setup_elide(FILE *fp);
@@ -145,7 +146,7 @@ bool is_strict_order(const char *order);
 int hpp_dimension__add_output(unsigned col, bool implicit);
 void reset_dimensions(void);
 int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
-			struct evlist *evlist,
+			struct evlist *evlist, struct perf_env *env,
 			int level);
 int output_field_add(struct perf_hpp_list *list, const char *tok, int *level);
 int64_t
-- 
2.49.0.1112.g889b7c5bd8-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 13:54 ` [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
@ 2025-05-21 15:42   ` Liang, Kan
  2025-05-21 15:57     ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2025-05-21 15:42 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users



On 2025-05-21 9:54 a.m., Ian Rogers wrote:
> By definition arch sample parsing and synthesis will inhibit certain
> kinds of cross-platform record then analysis (report, script,
> etc.). Remove arch_perf_parse_sample_weight and
> arch_perf_synthesize_sample_weight replacing with a common
> implementation. Combine perf_sample p_stage_cyc and retire_lat to
> capture the differing uses regardless of compiled for architecture.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/arch/powerpc/util/event.c       | 26 ---------------------
>  tools/perf/arch/x86/tests/sample-parsing.c |  4 ++--
>  tools/perf/arch/x86/util/event.c           | 27 ----------------------
>  tools/perf/builtin-script.c                |  2 +-
>  tools/perf/util/dlfilter.c                 |  2 +-
>  tools/perf/util/event.h                    |  2 --
>  tools/perf/util/evsel.c                    | 17 ++++++++++----
>  tools/perf/util/hist.c                     |  4 ++--
>  tools/perf/util/hist.h                     |  2 +-
>  tools/perf/util/intel-tpebs.c              |  4 ++--
>  tools/perf/util/sample.h                   |  5 +---
>  tools/perf/util/session.c                  |  2 +-
>  tools/perf/util/sort.c                     |  6 ++---
>  tools/perf/util/synthetic-events.c         | 10 ++++++--
>  14 files changed, 34 insertions(+), 79 deletions(-)
> 
> diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
> index 77d8cc2b5691..024ac8b54c33 100644
> --- a/tools/perf/arch/powerpc/util/event.c
> +++ b/tools/perf/arch/powerpc/util/event.c
> @@ -11,32 +11,6 @@
>  #include "../../../util/debug.h"
>  #include "../../../util/sample.h"
>  
> -void arch_perf_parse_sample_weight(struct perf_sample *data,
> -				   const __u64 *array, u64 type)
> -{
> -	union perf_sample_weight weight;
> -
> -	weight.full = *array;
> -	if (type & PERF_SAMPLE_WEIGHT)
> -		data->weight = weight.full;
> -	else {
> -		data->weight = weight.var1_dw;
> -		data->ins_lat = weight.var2_w;
> -		data->p_stage_cyc = weight.var3_w;
> -	}
> -}
> -
> -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> -					__u64 *array, u64 type)
> -{
> -	*array = data->weight;
> -
> -	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> -		*array &= 0xffffffff;
> -		*array |= ((u64)data->ins_lat << 32);
> -	}
> -}
> -
>  const char *arch_perf_header_entry(const char *se_header)
>  {
>  	if (!strcmp(se_header, "Local INSTR Latency"))
> diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
> index a061e8619267..95d8f7f1d2fb 100644
> --- a/tools/perf/arch/x86/tests/sample-parsing.c
> +++ b/tools/perf/arch/x86/tests/sample-parsing.c
> @@ -29,7 +29,7 @@ static bool samples_same(const struct perf_sample *s1,
>  {
>  	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
>  		COMP(ins_lat);
> -		COMP(retire_lat);
> +		COMP(p_stage_cyc_or_retire_lat);
>  	}
>  
>  	return true;
> @@ -50,7 +50,7 @@ static int do_test(u64 sample_type)
>  	struct perf_sample sample = {
>  		.weight		= 101,
>  		.ins_lat        = 102,
> -		.retire_lat     = 103,
> +		.p_stage_cyc_or_retire_lat = 103,
>  	};
>  	struct perf_sample sample_out;
>  	size_t i, sz, bufsz;
> diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
> index a0400707180c..576c1c36046c 100644
> --- a/tools/perf/arch/x86/util/event.c
> +++ b/tools/perf/arch/x86/util/event.c
> @@ -92,33 +92,6 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
>  
>  #endif
>  
> -void arch_perf_parse_sample_weight(struct perf_sample *data,
> -				   const __u64 *array, u64 type)
> -{
> -	union perf_sample_weight weight;
> -
> -	weight.full = *array;
> -	if (type & PERF_SAMPLE_WEIGHT)
> -		data->weight = weight.full;
> -	else {
> -		data->weight = weight.var1_dw;
> -		data->ins_lat = weight.var2_w;
> -		data->retire_lat = weight.var3_w;
> -	}
> -}
> -
> -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> -					__u64 *array, u64 type)
> -{
> -	*array = data->weight;
> -
> -	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> -		*array &= 0xffffffff;
> -		*array |= ((u64)data->ins_lat << 32);
> -		*array |= ((u64)data->retire_lat << 48);
> -	}
> -}
> -
>  const char *arch_perf_header_entry(const char *se_header)
>  {
>  	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 6c3bf74dd78c..c02c435e0f0b 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2251,7 +2251,7 @@ static void process_event(struct perf_script *script,
>  		fprintf(fp, "%16" PRIu16, sample->ins_lat);
>  
>  	if (PRINT_FIELD(RETIRE_LAT))
> -		fprintf(fp, "%16" PRIu16, sample->retire_lat);
> +		fprintf(fp, "%16" PRIu16, sample->p_stage_cyc_or_retire_lat);
>  
>  	if (PRINT_FIELD(CGROUP)) {
>  		const char *cgrp_name;
> diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
> index ddacef881af2..7e61ddfa66b8 100644
> --- a/tools/perf/util/dlfilter.c
> +++ b/tools/perf/util/dlfilter.c
> @@ -526,7 +526,7 @@ int dlfilter__do_filter_event(struct dlfilter *d,
>  	ASSIGN(period);
>  	ASSIGN(weight);
>  	ASSIGN(ins_lat);
> -	ASSIGN(p_stage_cyc);
> +	d_sample.p_stage_cyc = sample->p_stage_cyc_or_retire_lat;

Can you please move it out of the ASSIGN() area? Maybe right after
d_sample.size  = sizeof(d_sample);.
It looks strange to insert a non-macro assignment here.


>  	ASSIGN(transaction);
>  	ASSIGN(insn_cnt);
>  	ASSIGN(cyc_cnt);
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 664bf39567ce..119bce37f4fd 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -390,8 +390,6 @@ extern unsigned int proc_map_timeout;
>  #define PAGE_SIZE_NAME_LEN	32
>  char *get_page_size_name(u64 size, char *str);
>  
> -void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
> -void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
>  const char *arch_perf_header_entry(const char *se_header);
>  int arch_support_sort_key(const char *sort_key);
>  
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index d55482f094bf..097ab98bb81a 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2846,11 +2846,18 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
>  	return 0;
>  }
>  
> -void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
> -					  const __u64 *array,
> -					  u64 type __maybe_unused)
> +static void perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type)
>  {
> -	data->weight = *array;
> +	union perf_sample_weight weight;
> +
> +	weight.full = *array;
> +	if (type & PERF_SAMPLE_WEIGHT) {
> +		data->weight = weight.full;
> +	} else {
> +		data->weight = weight.var1_dw;
> +		data->ins_lat = weight.var2_w;
> +		data->p_stage_cyc_or_retire_lat = weight.var3_w;
> +	}
>  }
>  

It works for X86, but I'm not sure other ARCHs.
Since the PERF_SAMPLE_WEIGHT_STRUCT is newly added type, should it be
safer to do the below?
	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
		data->weight = weight.var1_dw;
		data->ins_lat = weight.var2_w;
		data->p_stage_cyc_or_retire_lat = weight.var3_w;
	} else
		data->weight = weight.full;

The default behavior is always data->weight = weight.full; unless an
ARCH apply the new type PERF_SAMPLE_WEIGHT_STRUCT.

>  u64 evsel__bitfield_swap_branch_flags(u64 value)
> @@ -3236,7 +3243,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  
>  	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
>  		OVERFLOW_CHECK_u64(array);
> -		arch_perf_parse_sample_weight(data, array, type);
> +		perf_parse_sample_weight(data, array, type);
>  		array++;
>  	}
>  
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index afc6855327ab..ae9803dca0b1 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -829,7 +829,7 @@ __hists__add_entry(struct hists *hists,
>  			.period	= sample->period,
>  			.weight1 = sample->weight,
>  			.weight2 = sample->ins_lat,
> -			.weight3 = sample->p_stage_cyc,
> +			.weight3 = sample->p_stage_cyc_or_retire_lat,
>  			.latency = al->latency,
>  		},
>  		.parent = sym_parent,
> @@ -846,7 +846,7 @@ __hists__add_entry(struct hists *hists,
>  		.time = hist_time(sample->time),
>  		.weight = sample->weight,
>  		.ins_lat = sample->ins_lat,
> -		.p_stage_cyc = sample->p_stage_cyc,
> +		.p_stage_cyc_or_retire_lat = sample->p_stage_cyc_or_retire_lat,
>  		.simd_flags = sample->simd_flags,
>  	}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
>  
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index c64254088fc7..67033bdabcf4 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -255,7 +255,7 @@ struct hist_entry {
>  	u64			code_page_size;
>  	u64			weight;
>  	u64			ins_lat;
> -	u64			p_stage_cyc;
> +	u64			p_stage_cyc_or_retire_lat;
>  	s32			socket;
>  	s32			cpu;
>  	int			parallelism;
> diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> index 4ad4bc118ea5..ec2f3ecf1e1c 100644
> --- a/tools/perf/util/intel-tpebs.c
> +++ b/tools/perf/util/intel-tpebs.c
> @@ -202,8 +202,8 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
>  	 * latency value will be used. Save the number of samples and the sum of
>  	 * retire latency value for each event.
>  	 */
> -	t->last = sample->retire_lat;
> -	update_stats(&t->stats, sample->retire_lat);
> +	t->last = sample->p_stage_cyc_or_retire_lat;
> +	update_stats(&t->stats, sample->p_stage_cyc_or_retire_lat);
>  	mutex_unlock(tpebs_mtx_get());
>  	return 0;
>  }
> diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
> index 0e96240052e9..3330d18fb5fd 100644
> --- a/tools/perf/util/sample.h
> +++ b/tools/perf/util/sample.h
> @@ -104,10 +104,7 @@ struct perf_sample {
>  	u8  cpumode;
>  	u16 misc;
>  	u16 ins_lat;
> -	union {
> -		u16 p_stage_cyc;
> -		u16 retire_lat;
> -	};
> +	u16 p_stage_cyc_or_retire_lat;
>  	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
>  	char insn[MAX_INSN];
>  	void *raw_data;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index a320672c264e..451bc24ccfba 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1094,7 +1094,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
>  		printf("... weight: %" PRIu64 "", sample->weight);
>  			if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
>  				printf(",0x%"PRIx16"", sample->ins_lat);
> -				printf(",0x%"PRIx16"", sample->p_stage_cyc);
> +				printf(",0x%"PRIx16"", sample->p_stage_cyc_or_retire_lat);
>  			}
>  		printf("\n");
>  	}
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 45e654653960..dda4ef0b5a73 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -1879,21 +1879,21 @@ struct sort_entry sort_global_ins_lat = {
>  static int64_t
>  sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
>  {
> -	return left->p_stage_cyc - right->p_stage_cyc;
> +	return left->p_stage_cyc_or_retire_lat - right->p_stage_cyc_or_retire_lat;
>  }
>  
>  static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
>  					size_t size, unsigned int width)
>  {
>  	return repsep_snprintf(bf, size, "%-*u", width,
> -			he->p_stage_cyc * he->stat.nr_events);
> +			he->p_stage_cyc_or_retire_lat * he->stat.nr_events);
>  }
>  
>  
>  static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
>  					size_t size, unsigned int width)
>  {
> -	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
> +	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc_or_retire_lat);
>  }
>  
>  struct sort_entry sort_local_p_stage_cyc = {
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 2fc4d0537840..449a41900fc4 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1567,10 +1567,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>  	return result;
>  }
>  
> -void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> +static void perf_synthesize_sample_weight(const struct perf_sample *data,
>  					       __u64 *array, u64 type __maybe_unused)
>  {
>  	*array = data->weight;
> +
> +	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> +		*array &= 0xffffffff;
> +		*array |= ((u64)data->ins_lat << 32);
> +		*array |= ((u64)data->p_stage_cyc_or_retire_lat << 48);
> +	}

It works for X86, but powerpc seems have a different behavior. The
p_stage_cyc is missed here. Not sure if it intends.

Thanks,
Kan

>  }
>  
>  static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
> @@ -1730,7 +1736,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>  	}
>  
>  	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
> -		arch_perf_synthesize_sample_weight(sample, array, type);
> +		perf_synthesize_sample_weight(sample, array, type);
>  		array++;
>  	}
>  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 13:54 ` [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
@ 2025-05-21 15:50   ` Liang, Kan
  2025-05-21 16:16     ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2025-05-21 15:50 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users



On 2025-05-21 9:54 a.m., Ian Rogers wrote:
> Previously arch_support_sort_key and arch_perf_header_entry used a
> weak symbol to compile as appropriate for x86 and powerpc. A
> limitation to this is that the handling of a data file could vary in
> cross-platform development. Change to using the perf_env of the
> current session to determine the architecture kind and set the sort
> key and header entries as appropriate.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/arch/powerpc/util/Build   |  1 -
>  tools/perf/arch/powerpc/util/event.c | 34 ----------------
>  tools/perf/arch/x86/util/event.c     | 19 ---------
>  tools/perf/builtin-annotate.c        |  2 +-
>  tools/perf/builtin-c2c.c             | 53 +++++++++++++-----------
>  tools/perf/builtin-diff.c            |  2 +-
>  tools/perf/builtin-report.c          |  2 +-
>  tools/perf/builtin-top.c             | 16 ++++----
>  tools/perf/tests/hists_cumulate.c    |  8 ++--
>  tools/perf/tests/hists_filter.c      |  8 ++--
>  tools/perf/tests/hists_link.c        |  8 ++--
>  tools/perf/tests/hists_output.c      | 10 ++---
>  tools/perf/util/event.h              |  3 --
>  tools/perf/util/sort.c               | 61 ++++++++++++++++++++--------
>  tools/perf/util/sort.h               |  5 ++-
>  15 files changed, 104 insertions(+), 128 deletions(-)
>  delete mode 100644 tools/perf/arch/powerpc/util/event.c
> 
> diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
> index ed82715080f9..fdd6a77a3432 100644
> --- a/tools/perf/arch/powerpc/util/Build
> +++ b/tools/perf/arch/powerpc/util/Build
> @@ -5,7 +5,6 @@ perf-util-y += mem-events.o
>  perf-util-y += pmu.o
>  perf-util-y += sym-handling.o
>  perf-util-y += evsel.o
> -perf-util-y += event.o
>  
>  perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
>  
> diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
> deleted file mode 100644
> index 024ac8b54c33..000000000000
> --- a/tools/perf/arch/powerpc/util/event.c
> +++ /dev/null
> @@ -1,34 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -#include <linux/types.h>
> -#include <linux/string.h>
> -#include <linux/zalloc.h>
> -
> -#include "../../../util/event.h"
> -#include "../../../util/synthetic-events.h"
> -#include "../../../util/machine.h"
> -#include "../../../util/tool.h"
> -#include "../../../util/map.h"
> -#include "../../../util/debug.h"
> -#include "../../../util/sample.h"
> -
> -const char *arch_perf_header_entry(const char *se_header)
> -{
> -	if (!strcmp(se_header, "Local INSTR Latency"))
> -		return "Finish Cyc";
> -	else if (!strcmp(se_header, "INSTR Latency"))
> -		return "Global Finish_cyc";
> -	else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> -		return "Dispatch Cyc";
> -	else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> -		return "Global Dispatch_cyc";
> -	return se_header;
> -}
> -
> -int arch_support_sort_key(const char *sort_key)
> -{
> -	if (!strcmp(sort_key, "p_stage_cyc"))
> -		return 1;
> -	if (!strcmp(sort_key, "local_p_stage_cyc"))
> -		return 1;
> -	return 0;
> -}
> diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
> index 576c1c36046c..3cd384317739 100644
> --- a/tools/perf/arch/x86/util/event.c
> +++ b/tools/perf/arch/x86/util/event.c
> @@ -91,22 +91,3 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
>  }
>  
>  #endif
> -
> -const char *arch_perf_header_entry(const char *se_header)
> -{
> -	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> -		return "Local Retire Latency";
> -	else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> -		return "Retire Latency";
> -
> -	return se_header;
> -}
> -
> -int arch_support_sort_key(const char *sort_key)
> -{
> -	if (!strcmp(sort_key, "p_stage_cyc"))
> -		return 1;
> -	if (!strcmp(sort_key, "local_p_stage_cyc"))
> -		return 1;
> -	return 0;
> -}
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index 9833c2c82a2f..a2d41614ef5e 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -947,7 +947,7 @@ int cmd_annotate(int argc, const char **argv)
>  			annotate_opts.show_br_cntr = true;
>  	}
>  
> -	if (setup_sorting(NULL) < 0)
> +	if (setup_sorting(/*evlist=*/NULL, &annotate.session->header.env) < 0)
>  		usage_with_options(annotate_usage, options);
>  
>  	ret = __cmd_annotate(&annotate);
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index e2e257bcc461..324368aabfa2 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -195,12 +195,14 @@ static struct hist_entry_ops c2c_entry_ops = {
>  
>  static int c2c_hists__init(struct c2c_hists *hists,
>  			   const char *sort,
> -			   int nr_header_lines);
> +			   int nr_header_lines,
> +			   struct perf_env *env);
>  
>  static struct c2c_hists*
>  he__get_c2c_hists(struct hist_entry *he,
>  		  const char *sort,
> -		  int nr_header_lines)
> +		  int nr_header_lines,
> +		  struct perf_env *env)
>  {
>  	struct c2c_hist_entry *c2c_he;
>  	struct c2c_hists *hists;
> @@ -214,7 +216,7 @@ he__get_c2c_hists(struct hist_entry *he,
>  	if (!hists)
>  		return NULL;
>  
> -	ret = c2c_hists__init(hists, sort, nr_header_lines);
> +	ret = c2c_hists__init(hists, sort, nr_header_lines, env);
>  	if (ret) {
>  		free(hists);
>  		return NULL;
> @@ -350,7 +352,7 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
>  
>  		mi = mi_dup;
>  
> -		c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2);
> +		c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2, machine->env);
>  		if (!c2c_hists)
>  			goto free_mi;
>  
> @@ -1966,7 +1968,8 @@ static struct c2c_fmt *get_format(const char *name)
>  	return c2c_fmt;
>  }
>  
> -static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
> +static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name,
> +				  struct perf_env *env __maybe_unused)
>  {
>  	struct c2c_fmt *c2c_fmt = get_format(name);
>  	int level = 0;
> @@ -1980,14 +1983,14 @@ static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
>  	return 0;
>  }
>  
> -static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
> +static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name, struct perf_env *env)
>  {
>  	struct c2c_fmt *c2c_fmt = get_format(name);
>  	struct c2c_dimension *dim;
>  
>  	if (!c2c_fmt) {
>  		reset_dimensions();
> -		return sort_dimension__add(hpp_list, name, NULL, 0);
> +		return sort_dimension__add(hpp_list, name, /*evlist=*/NULL, env, /*level=*/0);
>  	}
>  
>  	dim = c2c_fmt->dim;
> @@ -2008,7 +2011,7 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
>  										\
>  		for (tok = strtok_r((char *)_list, ", ", &tmp);			\
>  				tok; tok = strtok_r(NULL, ", ", &tmp)) {	\
> -			ret = _fn(hpp_list, tok);				\
> +			ret = _fn(hpp_list, tok, env);				\
>  			if (ret == -EINVAL) {					\
>  				pr_err("Invalid --fields key: `%s'", tok);	\
>  				break;						\
> @@ -2021,7 +2024,8 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
>  
>  static int hpp_list__parse(struct perf_hpp_list *hpp_list,
>  			   const char *output_,
> -			   const char *sort_)
> +			   const char *sort_,
> +			   struct perf_env *env)
>  {
>  	char *output = output_ ? strdup(output_) : NULL;
>  	char *sort   = sort_   ? strdup(sort_) : NULL;
> @@ -2052,7 +2056,8 @@ static int hpp_list__parse(struct perf_hpp_list *hpp_list,
>  
>  static int c2c_hists__init(struct c2c_hists *hists,
>  			   const char *sort,
> -			   int nr_header_lines)
> +			   int nr_header_lines,
> +			   struct perf_env *env)
>  {
>  	__hists__init(&hists->hists, &hists->list);
>  
> @@ -2066,15 +2071,16 @@ static int c2c_hists__init(struct c2c_hists *hists,
>  	/* Overload number of header lines.*/
>  	hists->list.nr_header_lines = nr_header_lines;
>  
> -	return hpp_list__parse(&hists->list, NULL, sort);
> +	return hpp_list__parse(&hists->list, /*output=*/NULL, sort, env);
>  }
>  
>  static int c2c_hists__reinit(struct c2c_hists *c2c_hists,
>  			     const char *output,
> -			     const char *sort)
> +			     const char *sort,
> +			     struct perf_env *env)
>  {
>  	perf_hpp__reset_output_field(&c2c_hists->list);
> -	return hpp_list__parse(&c2c_hists->list, output, sort);
> +	return hpp_list__parse(&c2c_hists->list, output, sort, env);
>  }
>  
>  #define DISPLAY_LINE_LIMIT  0.001
> @@ -2207,8 +2213,9 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
>  	return 0;
>  }
>  
> -static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
> +static int resort_cl_cb(struct hist_entry *he, void *arg)
>  {
> +	struct perf_env *env = arg;
>  	struct c2c_hist_entry *c2c_he;
>  	struct c2c_hists *c2c_hists;
>  	bool display = he__display(he, &c2c.shared_clines_stats);
> @@ -2222,7 +2229,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
>  		c2c_he->cacheline_idx = idx++;
>  		calc_width(c2c_he);
>  
> -		c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort);
> +		c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort, env);
>  
>  		hists__collapse_resort(&c2c_hists->hists, NULL);
>  		hists__output_resort_cb(&c2c_hists->hists, NULL, filter_cb);
> @@ -2333,7 +2340,7 @@ static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
>  	return 0;
>  }
>  
> -static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
> +static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb, void *arg)
>  {
>  	struct rb_node *next = rb_first_cached(&hists->entries);
>  	int ret = 0;
> @@ -2342,7 +2349,7 @@ static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
>  		struct hist_entry *he;
>  
>  		he = rb_entry(next, struct hist_entry, rb_node);
> -		ret = cb(he, NULL);
> +		ret = cb(he, arg);
>  		if (ret)
>  			break;
>  		next = rb_next(&he->rb_node);
> @@ -2448,7 +2455,7 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
>  	hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
>  }
>  
> -static void print_pareto(FILE *out)
> +static void print_pareto(FILE *out, struct perf_env *env)
>  {
>  	struct perf_hpp_list hpp_list;
>  	struct rb_node *nd;
> @@ -2473,7 +2480,7 @@ static void print_pareto(FILE *out)
>  			    "dcacheline";
>  
>  	perf_hpp_list__init(&hpp_list);
> -	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> +	ret = hpp_list__parse(&hpp_list, cl_output, /*evlist=*/NULL, env);
>  
>  	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
>  		return;
> @@ -2538,7 +2545,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
>  	fprintf(out, "=================================================\n");
>  	fprintf(out, "#\n");
>  
> -	print_pareto(out);
> +	print_pareto(out, &session->header.env);
>  }
>  
>  #ifdef HAVE_SLANG_SUPPORT
> @@ -3095,7 +3102,7 @@ static int perf_c2c__report(int argc, const char **argv)
>  		goto out_session;
>  	}
>  
> -	err = c2c_hists__init(&c2c.hists, "dcacheline", 2);
> +	err = c2c_hists__init(&c2c.hists, "dcacheline", 2, &session->header.env);
>  	if (err) {
>  		pr_debug("Failed to initialize hists\n");
>  		goto out_session;
> @@ -3179,13 +3186,13 @@ static int perf_c2c__report(int argc, const char **argv)
>  	else if (c2c.display == DISPLAY_SNP_PEER)
>  		sort_str = "tot_peer";
>  
> -	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
> +	c2c_hists__reinit(&c2c.hists, output_str, sort_str, &session->header.env);
>  
>  	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
>  
>  	hists__collapse_resort(&c2c.hists.hists, NULL);
>  	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
> -	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
> +	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb, &session->header.env);
>  
>  	ui_progress__finish();
>  
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index ae490d58af92..f6735cf02329 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -2003,7 +2003,7 @@ int cmd_diff(int argc, const char **argv)
>  		sort__mode = SORT_MODE__DIFF;
>  	}
>  
> -	if (setup_sorting(NULL) < 0)
> +	if (setup_sorting(/*evlist=*/NULL, &data__files[0].session->header.env) < 0)
>  		usage_with_options(diff_usage, options);
>  
>  	setup_pager();
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index f0299c7ee025..8be6e7c3bd22 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1776,7 +1776,7 @@ int cmd_report(int argc, const char **argv)
>  	}
>  
>  	if ((last_key != K_SWITCH_INPUT_DATA && last_key != K_RELOAD) &&
> -	    (setup_sorting(session->evlist) < 0)) {
> +	    (setup_sorting(session->evlist, &session->header.env) < 0)) {
>  		if (sort_order)
>  			parse_options_usage(report_usage, options, "s", 1);
>  		if (field_order)
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 7b6cde87d2af..13ef0d188a96 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
>  
>  	setup_browser(false);
>  
> -	if (setup_sorting(top.evlist) < 0) {
> +	top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
> +	if (IS_ERR(top.session)) {
> +		status = PTR_ERR(top.session);
> +		top.session = NULL;
> +		goto out_delete_evlist;
> +	}
> +
> +	if (setup_sorting(top.evlist, &top.session->header.env) < 0) {

I doubt a valide env can be got in perf_session__new(), since there is
no perf.data in perf top.
Maybe just need to invoke the perf_env__raw_arch() instead to fill the
env->arch.

Thanks,
Kan

>  		if (sort_order)
>  			parse_options_usage(top_usage, options, "s", 1);
>  		if (field_order)
> @@ -1820,13 +1827,6 @@ int cmd_top(int argc, const char **argv)
>  		signal(SIGWINCH, winch_sig);
>  	}
>  
> -	top.session = perf_session__new(NULL, NULL);
> -	if (IS_ERR(top.session)) {
> -		status = PTR_ERR(top.session);
> -		top.session = NULL;
> -		goto out_delete_evlist;
> -	}
> -
>  	if (!evlist__needs_bpf_sb_event(top.evlist))
>  		top.record_opts.no_bpf_event = true;
>  
> diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
> index 1e0f5a310fd5..3eb9ef8d7ec6 100644
> --- a/tools/perf/tests/hists_cumulate.c
> +++ b/tools/perf/tests/hists_cumulate.c
> @@ -295,7 +295,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
>  	symbol_conf.cumulate_callchain = false;
>  	evsel__reset_sample_bit(evsel, CALLCHAIN);
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  	callchain_register_param(&callchain_param);
>  
>  	err = add_hist_entries(hists, machine);
> @@ -442,7 +442,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
>  	symbol_conf.cumulate_callchain = false;
>  	evsel__set_sample_bit(evsel, CALLCHAIN);
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  	callchain_register_param(&callchain_param);
>  
>  	err = add_hist_entries(hists, machine);
> @@ -500,7 +500,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
>  	symbol_conf.cumulate_callchain = true;
>  	evsel__reset_sample_bit(evsel, CALLCHAIN);
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  	callchain_register_param(&callchain_param);
>  
>  	err = add_hist_entries(hists, machine);
> @@ -684,7 +684,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
>  	symbol_conf.cumulate_callchain = true;
>  	evsel__set_sample_bit(evsel, CALLCHAIN);
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  
>  	callchain_param = callchain_param_default;
>  	callchain_register_param(&callchain_param);
> diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
> index 4b2e4f2fbe48..1cebd20cc91c 100644
> --- a/tools/perf/tests/hists_filter.c
> +++ b/tools/perf/tests/hists_filter.c
> @@ -131,10 +131,6 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
>  		goto out;
>  	err = TEST_FAIL;
>  
> -	/* default sort order (comm,dso,sym) will be used */
> -	if (setup_sorting(NULL) < 0)
> -		goto out;
> -
>  	machines__init(&machines);
>  
>  	/* setup threads/dso/map/symbols also */
> @@ -145,6 +141,10 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
>  	if (verbose > 1)
>  		machine__fprintf(machine, stderr);
>  
> +	/* default sort order (comm,dso,sym) will be used */
> +	if (setup_sorting(evlist, machine->env) < 0)
> +		goto out;
> +
>  	/* process sample events */
>  	err = add_hist_entries(evlist, machine);
>  	if (err < 0)
> diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
> index 5b6f1e883466..996f5f0b3bd1 100644
> --- a/tools/perf/tests/hists_link.c
> +++ b/tools/perf/tests/hists_link.c
> @@ -303,10 +303,6 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
>  		goto out;
>  
>  	err = TEST_FAIL;
> -	/* default sort order (comm,dso,sym) will be used */
> -	if (setup_sorting(NULL) < 0)
> -		goto out;
> -
>  	machines__init(&machines);
>  
>  	/* setup threads/dso/map/symbols also */
> @@ -317,6 +313,10 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
>  	if (verbose > 1)
>  		machine__fprintf(machine, stderr);
>  
> +	/* default sort order (comm,dso,sym) will be used */
> +	if (setup_sorting(evlist, machine->env) < 0)
> +		goto out;
> +
>  	/* process sample events */
>  	err = add_hist_entries(evlist, machine);
>  	if (err < 0)
> diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
> index 33b5cc8352a7..ee5ec8bda60e 100644
> --- a/tools/perf/tests/hists_output.c
> +++ b/tools/perf/tests/hists_output.c
> @@ -146,7 +146,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
>  	field_order = NULL;
>  	sort_order = NULL; /* equivalent to sort_order = "comm,dso,sym" */
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  
>  	/*
>  	 * expected output:
> @@ -248,7 +248,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
>  	field_order = "overhead,cpu";
>  	sort_order = "pid";
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  
>  	/*
>  	 * expected output:
> @@ -304,7 +304,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
>  	field_order = "comm,overhead,dso";
>  	sort_order = NULL;
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  
>  	/*
>  	 * expected output:
> @@ -378,7 +378,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
>  	field_order = "dso,sym,comm,overhead,dso";
>  	sort_order = "sym";
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  
>  	/*
>  	 * expected output:
> @@ -480,7 +480,7 @@ static int test5(struct evsel *evsel, struct machine *machine)
>  	field_order = "cpu,pid,comm,dso,sym";
>  	sort_order = "dso,pid";
>  
> -	setup_sorting(NULL);
> +	setup_sorting(/*evlist=*/NULL, machine->env);
>  
>  	/*
>  	 * expected output:
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 119bce37f4fd..3b97a31736c5 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -390,9 +390,6 @@ extern unsigned int proc_map_timeout;
>  #define PAGE_SIZE_NAME_LEN	32
>  char *get_page_size_name(u64 size, char *str);
>  
> -const char *arch_perf_header_entry(const char *se_header);
> -int arch_support_sort_key(const char *sort_key);
> -
>  static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
>  {
>  	return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index dda4ef0b5a73..ccab10cd24a5 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -2526,19 +2526,44 @@ struct sort_dimension {
>  	int			taken;
>  };
>  
> -int __weak arch_support_sort_key(const char *sort_key __maybe_unused)
> +static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
>  {
> +	const char *arch = perf_env__arch(env);
> +
> +	if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
> +		if (!strcmp(sort_key, "p_stage_cyc"))
> +			return 1;
> +		if (!strcmp(sort_key, "local_p_stage_cyc"))
> +			return 1;
> +	}
>  	return 0;
>  }
>  
> -const char * __weak arch_perf_header_entry(const char *se_header)
> -{
> +static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
> +{
> +	const char *arch = perf_env__arch(env);
> +
> +	if (!strcmp("x86", arch)) {
> +		if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> +			return "Local Retire Latency";
> +		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> +			return "Retire Latency";
> +	} else if (!strcmp("powerpc", arch)) {
> +		if (!strcmp(se_header, "Local INSTR Latency"))
> +			return "Finish Cyc";
> +		else if (!strcmp(se_header, "INSTR Latency"))
> +			return "Global Finish_cyc";
> +		else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> +			return "Dispatch Cyc";
> +		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> +			return "Global Dispatch_cyc";
> +	}
>  	return se_header;
>  }
>  
> -static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
> +static void sort_dimension_add_dynamic_header(struct sort_dimension *sd, struct perf_env *env)
>  {
> -	sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header);
> +	sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header, env);
>  }
>  
>  #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
> @@ -3590,7 +3615,7 @@ int hpp_dimension__add_output(unsigned col, bool implicit)
>  }
>  
>  int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> -			struct evlist *evlist,
> +			struct evlist *evlist, struct perf_env *env,
>  			int level)
>  {
>  	unsigned int i, j;
> @@ -3603,7 +3628,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
>  	 */
>  	for (j = 0; j < ARRAY_SIZE(arch_specific_sort_keys); j++) {
>  		if (!strcmp(arch_specific_sort_keys[j], tok) &&
> -				!arch_support_sort_key(tok)) {
> +		    !arch_support_sort_key(tok, env)) {
>  			return 0;
>  		}
>  	}
> @@ -3616,7 +3641,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
>  
>  		for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
>  			if (sd->name && !strcmp(dynamic_headers[j], sd->name))
> -				sort_dimension_add_dynamic_header(sd);
> +				sort_dimension_add_dynamic_header(sd, env);
>  		}
>  
>  		if (sd->entry == &sort_parent && parent_pattern) {
> @@ -3712,13 +3737,13 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
>  }
>  
>  /* This should match with sort_dimension__add() above */
> -static bool is_hpp_sort_key(const char *key)
> +static bool is_hpp_sort_key(const char *key, struct perf_env *env)
>  {
>  	unsigned i;
>  
>  	for (i = 0; i < ARRAY_SIZE(arch_specific_sort_keys); i++) {
>  		if (!strcmp(arch_specific_sort_keys[i], key) &&
> -		    !arch_support_sort_key(key)) {
> +		    !arch_support_sort_key(key, env)) {
>  			return false;
>  		}
>  	}
> @@ -3740,7 +3765,7 @@ static bool is_hpp_sort_key(const char *key)
>  }
>  
>  static int setup_sort_list(struct perf_hpp_list *list, char *str,
> -			   struct evlist *evlist)
> +			   struct evlist *evlist, struct perf_env *env)
>  {
>  	char *tmp, *tok;
>  	int ret = 0;
> @@ -3769,7 +3794,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
>  		}
>  
>  		if (*tok) {
> -			if (is_hpp_sort_key(tok)) {
> +			if (is_hpp_sort_key(tok, env)) {
>  				/* keep output (hpp) sort keys in the same level */
>  				if (prev_was_hpp) {
>  					bool next_same = (level == next_level);
> @@ -3782,7 +3807,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
>  				prev_was_hpp = false;
>  			}
>  
> -			ret = sort_dimension__add(list, tok, evlist, level);
> +			ret = sort_dimension__add(list, tok, evlist, env, level);
>  			if (ret == -EINVAL) {
>  				if (!cacheline_size() && !strncasecmp(tok, "dcacheline", strlen(tok)))
>  					ui__error("The \"dcacheline\" --sort key needs to know the cacheline size and it couldn't be determined on this system");
> @@ -3911,7 +3936,7 @@ static char *setup_overhead(char *keys)
>  	return keys;
>  }
>  
> -static int __setup_sorting(struct evlist *evlist)
> +static int __setup_sorting(struct evlist *evlist, struct perf_env *env)
>  {
>  	char *str;
>  	const char *sort_keys;
> @@ -3951,7 +3976,7 @@ static int __setup_sorting(struct evlist *evlist)
>  		}
>  	}
>  
> -	ret = setup_sort_list(&perf_hpp_list, str, evlist);
> +	ret = setup_sort_list(&perf_hpp_list, str, evlist, env);
>  
>  	free(str);
>  	return ret;
> @@ -4187,16 +4212,16 @@ static int __setup_output_field(void)
>  	return ret;
>  }
>  
> -int setup_sorting(struct evlist *evlist)
> +int setup_sorting(struct evlist *evlist, struct perf_env *env)
>  {
>  	int err;
>  
> -	err = __setup_sorting(evlist);
> +	err = __setup_sorting(evlist, env);
>  	if (err < 0)
>  		return err;
>  
>  	if (parent_pattern != default_parent_pattern) {
> -		err = sort_dimension__add(&perf_hpp_list, "parent", evlist, -1);
> +		err = sort_dimension__add(&perf_hpp_list, "parent", evlist, env, -1);
>  		if (err < 0)
>  			return err;
>  	}
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index a742ab7f3c67..d7787958e06b 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -6,6 +6,7 @@
>  #include "hist.h"
>  
>  struct option;
> +struct perf_env;
>  
>  extern regex_t parent_regex;
>  extern const char *sort_order;
> @@ -130,7 +131,7 @@ extern struct sort_entry sort_thread;
>  
>  struct evlist;
>  struct tep_handle;
> -int setup_sorting(struct evlist *evlist);
> +int setup_sorting(struct evlist *evlist, struct perf_env *env);
>  int setup_output_field(void);
>  void reset_output_field(void);
>  void sort__setup_elide(FILE *fp);
> @@ -145,7 +146,7 @@ bool is_strict_order(const char *order);
>  int hpp_dimension__add_output(unsigned col, bool implicit);
>  void reset_dimensions(void);
>  int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> -			struct evlist *evlist,
> +			struct evlist *evlist, struct perf_env *env,
>  			int level);
>  int output_field_add(struct perf_hpp_list *list, const char *tok, int *level);
>  int64_t


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 15:42   ` Liang, Kan
@ 2025-05-21 15:57     ` Ian Rogers
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 15:57 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users

On Wed, May 21, 2025 at 8:42 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-05-21 9:54 a.m., Ian Rogers wrote:
> > By definition arch sample parsing and synthesis will inhibit certain
> > kinds of cross-platform record then analysis (report, script,
> > etc.). Remove arch_perf_parse_sample_weight and
> > arch_perf_synthesize_sample_weight replacing with a common
> > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > capture the differing uses regardless of compiled for architecture.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/arch/powerpc/util/event.c       | 26 ---------------------
> >  tools/perf/arch/x86/tests/sample-parsing.c |  4 ++--
> >  tools/perf/arch/x86/util/event.c           | 27 ----------------------
> >  tools/perf/builtin-script.c                |  2 +-
> >  tools/perf/util/dlfilter.c                 |  2 +-
> >  tools/perf/util/event.h                    |  2 --
> >  tools/perf/util/evsel.c                    | 17 ++++++++++----
> >  tools/perf/util/hist.c                     |  4 ++--
> >  tools/perf/util/hist.h                     |  2 +-
> >  tools/perf/util/intel-tpebs.c              |  4 ++--
> >  tools/perf/util/sample.h                   |  5 +---
> >  tools/perf/util/session.c                  |  2 +-
> >  tools/perf/util/sort.c                     |  6 ++---
> >  tools/perf/util/synthetic-events.c         | 10 ++++++--
> >  14 files changed, 34 insertions(+), 79 deletions(-)
> >
> > diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
> > index 77d8cc2b5691..024ac8b54c33 100644
> > --- a/tools/perf/arch/powerpc/util/event.c
> > +++ b/tools/perf/arch/powerpc/util/event.c
> > @@ -11,32 +11,6 @@
> >  #include "../../../util/debug.h"
> >  #include "../../../util/sample.h"
> >
> > -void arch_perf_parse_sample_weight(struct perf_sample *data,
> > -                                const __u64 *array, u64 type)
> > -{
> > -     union perf_sample_weight weight;
> > -
> > -     weight.full = *array;
> > -     if (type & PERF_SAMPLE_WEIGHT)
> > -             data->weight = weight.full;
> > -     else {
> > -             data->weight = weight.var1_dw;
> > -             data->ins_lat = weight.var2_w;
> > -             data->p_stage_cyc = weight.var3_w;
> > -     }
> > -}
> > -
> > -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> > -                                     __u64 *array, u64 type)
> > -{
> > -     *array = data->weight;
> > -
> > -     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > -             *array &= 0xffffffff;
> > -             *array |= ((u64)data->ins_lat << 32);
> > -     }
> > -}
> > -
> >  const char *arch_perf_header_entry(const char *se_header)
> >  {
> >       if (!strcmp(se_header, "Local INSTR Latency"))
> > diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
> > index a061e8619267..95d8f7f1d2fb 100644
> > --- a/tools/perf/arch/x86/tests/sample-parsing.c
> > +++ b/tools/perf/arch/x86/tests/sample-parsing.c
> > @@ -29,7 +29,7 @@ static bool samples_same(const struct perf_sample *s1,
> >  {
> >       if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> >               COMP(ins_lat);
> > -             COMP(retire_lat);
> > +             COMP(p_stage_cyc_or_retire_lat);
> >       }
> >
> >       return true;
> > @@ -50,7 +50,7 @@ static int do_test(u64 sample_type)
> >       struct perf_sample sample = {
> >               .weight         = 101,
> >               .ins_lat        = 102,
> > -             .retire_lat     = 103,
> > +             .p_stage_cyc_or_retire_lat = 103,
> >       };
> >       struct perf_sample sample_out;
> >       size_t i, sz, bufsz;
> > diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
> > index a0400707180c..576c1c36046c 100644
> > --- a/tools/perf/arch/x86/util/event.c
> > +++ b/tools/perf/arch/x86/util/event.c
> > @@ -92,33 +92,6 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
> >
> >  #endif
> >
> > -void arch_perf_parse_sample_weight(struct perf_sample *data,
> > -                                const __u64 *array, u64 type)
> > -{
> > -     union perf_sample_weight weight;
> > -
> > -     weight.full = *array;
> > -     if (type & PERF_SAMPLE_WEIGHT)
> > -             data->weight = weight.full;
> > -     else {
> > -             data->weight = weight.var1_dw;
> > -             data->ins_lat = weight.var2_w;
> > -             data->retire_lat = weight.var3_w;
> > -     }
> > -}
> > -
> > -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> > -                                     __u64 *array, u64 type)
> > -{
> > -     *array = data->weight;
> > -
> > -     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > -             *array &= 0xffffffff;
> > -             *array |= ((u64)data->ins_lat << 32);
> > -             *array |= ((u64)data->retire_lat << 48);
> > -     }
> > -}
> > -
> >  const char *arch_perf_header_entry(const char *se_header)
> >  {
> >       if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index 6c3bf74dd78c..c02c435e0f0b 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -2251,7 +2251,7 @@ static void process_event(struct perf_script *script,
> >               fprintf(fp, "%16" PRIu16, sample->ins_lat);
> >
> >       if (PRINT_FIELD(RETIRE_LAT))
> > -             fprintf(fp, "%16" PRIu16, sample->retire_lat);
> > +             fprintf(fp, "%16" PRIu16, sample->p_stage_cyc_or_retire_lat);
> >
> >       if (PRINT_FIELD(CGROUP)) {
> >               const char *cgrp_name;
> > diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
> > index ddacef881af2..7e61ddfa66b8 100644
> > --- a/tools/perf/util/dlfilter.c
> > +++ b/tools/perf/util/dlfilter.c
> > @@ -526,7 +526,7 @@ int dlfilter__do_filter_event(struct dlfilter *d,
> >       ASSIGN(period);
> >       ASSIGN(weight);
> >       ASSIGN(ins_lat);
> > -     ASSIGN(p_stage_cyc);
> > +     d_sample.p_stage_cyc = sample->p_stage_cyc_or_retire_lat;
>
> Can you please move it out of the ASSIGN() area? Maybe right after
> d_sample.size  = sizeof(d_sample);.
> It looks strange to insert a non-macro assignment here.

Ok. Will fix in v3.

>
> >       ASSIGN(transaction);
> >       ASSIGN(insn_cnt);
> >       ASSIGN(cyc_cnt);
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index 664bf39567ce..119bce37f4fd 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -390,8 +390,6 @@ extern unsigned int proc_map_timeout;
> >  #define PAGE_SIZE_NAME_LEN   32
> >  char *get_page_size_name(u64 size, char *str);
> >
> > -void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
> > -void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
> >  const char *arch_perf_header_entry(const char *se_header);
> >  int arch_support_sort_key(const char *sort_key);
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index d55482f094bf..097ab98bb81a 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -2846,11 +2846,18 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
> >       return 0;
> >  }
> >
> > -void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
> > -                                       const __u64 *array,
> > -                                       u64 type __maybe_unused)
> > +static void perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type)
> >  {
> > -     data->weight = *array;
> > +     union perf_sample_weight weight;
> > +
> > +     weight.full = *array;
> > +     if (type & PERF_SAMPLE_WEIGHT) {
> > +             data->weight = weight.full;
> > +     } else {
> > +             data->weight = weight.var1_dw;
> > +             data->ins_lat = weight.var2_w;
> > +             data->p_stage_cyc_or_retire_lat = weight.var3_w;
> > +     }
> >  }
> >
>
> It works for X86, but I'm not sure other ARCHs.
> Since the PERF_SAMPLE_WEIGHT_STRUCT is newly added type, should it be
> safer to do the below?
>         if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
>                 data->weight = weight.var1_dw;
>                 data->ins_lat = weight.var2_w;
>                 data->p_stage_cyc_or_retire_lat = weight.var3_w;
>         } else
>                 data->weight = weight.full;
>
> The default behavior is always data->weight = weight.full; unless an
> ARCH apply the new type PERF_SAMPLE_WEIGHT_STRUCT.

Sure. I don't think it will change any of the behavior.

> >  u64 evsel__bitfield_swap_branch_flags(u64 value)
> > @@ -3236,7 +3243,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> >
> >       if (type & PERF_SAMPLE_WEIGHT_TYPE) {
> >               OVERFLOW_CHECK_u64(array);
> > -             arch_perf_parse_sample_weight(data, array, type);
> > +             perf_parse_sample_weight(data, array, type);
> >               array++;
> >       }
> >
> > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > index afc6855327ab..ae9803dca0b1 100644
> > --- a/tools/perf/util/hist.c
> > +++ b/tools/perf/util/hist.c
> > @@ -829,7 +829,7 @@ __hists__add_entry(struct hists *hists,
> >                       .period = sample->period,
> >                       .weight1 = sample->weight,
> >                       .weight2 = sample->ins_lat,
> > -                     .weight3 = sample->p_stage_cyc,
> > +                     .weight3 = sample->p_stage_cyc_or_retire_lat,
> >                       .latency = al->latency,
> >               },
> >               .parent = sym_parent,
> > @@ -846,7 +846,7 @@ __hists__add_entry(struct hists *hists,
> >               .time = hist_time(sample->time),
> >               .weight = sample->weight,
> >               .ins_lat = sample->ins_lat,
> > -             .p_stage_cyc = sample->p_stage_cyc,
> > +             .p_stage_cyc_or_retire_lat = sample->p_stage_cyc_or_retire_lat,
> >               .simd_flags = sample->simd_flags,
> >       }, *he = hists__findnew_entry(hists, &entry, al, sample_self);
> >
> > diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> > index c64254088fc7..67033bdabcf4 100644
> > --- a/tools/perf/util/hist.h
> > +++ b/tools/perf/util/hist.h
> > @@ -255,7 +255,7 @@ struct hist_entry {
> >       u64                     code_page_size;
> >       u64                     weight;
> >       u64                     ins_lat;
> > -     u64                     p_stage_cyc;
> > +     u64                     p_stage_cyc_or_retire_lat;
> >       s32                     socket;
> >       s32                     cpu;
> >       int                     parallelism;
> > diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> > index 4ad4bc118ea5..ec2f3ecf1e1c 100644
> > --- a/tools/perf/util/intel-tpebs.c
> > +++ b/tools/perf/util/intel-tpebs.c
> > @@ -202,8 +202,8 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
> >        * latency value will be used. Save the number of samples and the sum of
> >        * retire latency value for each event.
> >        */
> > -     t->last = sample->retire_lat;
> > -     update_stats(&t->stats, sample->retire_lat);
> > +     t->last = sample->p_stage_cyc_or_retire_lat;
> > +     update_stats(&t->stats, sample->p_stage_cyc_or_retire_lat);
> >       mutex_unlock(tpebs_mtx_get());
> >       return 0;
> >  }
> > diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
> > index 0e96240052e9..3330d18fb5fd 100644
> > --- a/tools/perf/util/sample.h
> > +++ b/tools/perf/util/sample.h
> > @@ -104,10 +104,7 @@ struct perf_sample {
> >       u8  cpumode;
> >       u16 misc;
> >       u16 ins_lat;
> > -     union {
> > -             u16 p_stage_cyc;
> > -             u16 retire_lat;
> > -     };
> > +     u16 p_stage_cyc_or_retire_lat;
> >       bool no_hw_idx;         /* No hw_idx collected in branch_stack */
> >       char insn[MAX_INSN];
> >       void *raw_data;
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index a320672c264e..451bc24ccfba 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1094,7 +1094,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
> >               printf("... weight: %" PRIu64 "", sample->weight);
> >                       if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
> >                               printf(",0x%"PRIx16"", sample->ins_lat);
> > -                             printf(",0x%"PRIx16"", sample->p_stage_cyc);
> > +                             printf(",0x%"PRIx16"", sample->p_stage_cyc_or_retire_lat);
> >                       }
> >               printf("\n");
> >       }
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > index 45e654653960..dda4ef0b5a73 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -1879,21 +1879,21 @@ struct sort_entry sort_global_ins_lat = {
> >  static int64_t
> >  sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
> >  {
> > -     return left->p_stage_cyc - right->p_stage_cyc;
> > +     return left->p_stage_cyc_or_retire_lat - right->p_stage_cyc_or_retire_lat;
> >  }
> >
> >  static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
> >                                       size_t size, unsigned int width)
> >  {
> >       return repsep_snprintf(bf, size, "%-*u", width,
> > -                     he->p_stage_cyc * he->stat.nr_events);
> > +                     he->p_stage_cyc_or_retire_lat * he->stat.nr_events);
> >  }
> >
> >
> >  static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
> >                                       size_t size, unsigned int width)
> >  {
> > -     return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
> > +     return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc_or_retire_lat);
> >  }
> >
> >  struct sort_entry sort_local_p_stage_cyc = {
> > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > index 2fc4d0537840..449a41900fc4 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -1567,10 +1567,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
> >       return result;
> >  }
> >
> > -void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> > +static void perf_synthesize_sample_weight(const struct perf_sample *data,
> >                                              __u64 *array, u64 type __maybe_unused)
> >  {
> >       *array = data->weight;
> > +
> > +     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > +             *array &= 0xffffffff;
> > +             *array |= ((u64)data->ins_lat << 32);
> > +             *array |= ((u64)data->p_stage_cyc_or_retire_lat << 48);
> > +     }
>
> It works for X86, but powerpc seems have a different behavior. The
> p_stage_cyc is missed here. Not sure if it intends.

Yeah, I was assuming it was a bug on powerpc not to synthesize this
part of the sample as it is read. As evsel__parse_sample will set this
to 0 and the bits don't overlap I expect the change to either be a bug
fix or benign on powerpc.

Thanks,
Ian

> Thanks,
> Kan
>
> >  }
> >
> >  static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
> > @@ -1730,7 +1736,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
> >       }
> >
> >       if (type & PERF_SAMPLE_WEIGHT_TYPE) {
> > -             arch_perf_synthesize_sample_weight(sample, array, type);
> > +             perf_synthesize_sample_weight(sample, array, type);
> >               array++;
> >       }
> >
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 15:50   ` Liang, Kan
@ 2025-05-21 16:16     ` Ian Rogers
  2025-05-21 18:13       ` Liang, Kan
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 16:16 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users

On Wed, May 21, 2025 at 8:50 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-05-21 9:54 a.m., Ian Rogers wrote:
> > Previously arch_support_sort_key and arch_perf_header_entry used a
> > weak symbol to compile as appropriate for x86 and powerpc. A
> > limitation to this is that the handling of a data file could vary in
> > cross-platform development. Change to using the perf_env of the
> > current session to determine the architecture kind and set the sort
> > key and header entries as appropriate.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/arch/powerpc/util/Build   |  1 -
> >  tools/perf/arch/powerpc/util/event.c | 34 ----------------
> >  tools/perf/arch/x86/util/event.c     | 19 ---------
> >  tools/perf/builtin-annotate.c        |  2 +-
> >  tools/perf/builtin-c2c.c             | 53 +++++++++++++-----------
> >  tools/perf/builtin-diff.c            |  2 +-
> >  tools/perf/builtin-report.c          |  2 +-
> >  tools/perf/builtin-top.c             | 16 ++++----
> >  tools/perf/tests/hists_cumulate.c    |  8 ++--
> >  tools/perf/tests/hists_filter.c      |  8 ++--
> >  tools/perf/tests/hists_link.c        |  8 ++--
> >  tools/perf/tests/hists_output.c      | 10 ++---
> >  tools/perf/util/event.h              |  3 --
> >  tools/perf/util/sort.c               | 61 ++++++++++++++++++++--------
> >  tools/perf/util/sort.h               |  5 ++-
> >  15 files changed, 104 insertions(+), 128 deletions(-)
> >  delete mode 100644 tools/perf/arch/powerpc/util/event.c
> >
> > diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
> > index ed82715080f9..fdd6a77a3432 100644
> > --- a/tools/perf/arch/powerpc/util/Build
> > +++ b/tools/perf/arch/powerpc/util/Build
> > @@ -5,7 +5,6 @@ perf-util-y += mem-events.o
> >  perf-util-y += pmu.o
> >  perf-util-y += sym-handling.o
> >  perf-util-y += evsel.o
> > -perf-util-y += event.o
> >
> >  perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
> >
> > diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
> > deleted file mode 100644
> > index 024ac8b54c33..000000000000
> > --- a/tools/perf/arch/powerpc/util/event.c
> > +++ /dev/null
> > @@ -1,34 +0,0 @@
> > -// SPDX-License-Identifier: GPL-2.0
> > -#include <linux/types.h>
> > -#include <linux/string.h>
> > -#include <linux/zalloc.h>
> > -
> > -#include "../../../util/event.h"
> > -#include "../../../util/synthetic-events.h"
> > -#include "../../../util/machine.h"
> > -#include "../../../util/tool.h"
> > -#include "../../../util/map.h"
> > -#include "../../../util/debug.h"
> > -#include "../../../util/sample.h"
> > -
> > -const char *arch_perf_header_entry(const char *se_header)
> > -{
> > -     if (!strcmp(se_header, "Local INSTR Latency"))
> > -             return "Finish Cyc";
> > -     else if (!strcmp(se_header, "INSTR Latency"))
> > -             return "Global Finish_cyc";
> > -     else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> > -             return "Dispatch Cyc";
> > -     else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> > -             return "Global Dispatch_cyc";
> > -     return se_header;
> > -}
> > -
> > -int arch_support_sort_key(const char *sort_key)
> > -{
> > -     if (!strcmp(sort_key, "p_stage_cyc"))
> > -             return 1;
> > -     if (!strcmp(sort_key, "local_p_stage_cyc"))
> > -             return 1;
> > -     return 0;
> > -}
> > diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
> > index 576c1c36046c..3cd384317739 100644
> > --- a/tools/perf/arch/x86/util/event.c
> > +++ b/tools/perf/arch/x86/util/event.c
> > @@ -91,22 +91,3 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
> >  }
> >
> >  #endif
> > -
> > -const char *arch_perf_header_entry(const char *se_header)
> > -{
> > -     if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> > -             return "Local Retire Latency";
> > -     else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> > -             return "Retire Latency";
> > -
> > -     return se_header;
> > -}
> > -
> > -int arch_support_sort_key(const char *sort_key)
> > -{
> > -     if (!strcmp(sort_key, "p_stage_cyc"))
> > -             return 1;
> > -     if (!strcmp(sort_key, "local_p_stage_cyc"))
> > -             return 1;
> > -     return 0;
> > -}
> > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> > index 9833c2c82a2f..a2d41614ef5e 100644
> > --- a/tools/perf/builtin-annotate.c
> > +++ b/tools/perf/builtin-annotate.c
> > @@ -947,7 +947,7 @@ int cmd_annotate(int argc, const char **argv)
> >                       annotate_opts.show_br_cntr = true;
> >       }
> >
> > -     if (setup_sorting(NULL) < 0)
> > +     if (setup_sorting(/*evlist=*/NULL, &annotate.session->header.env) < 0)
> >               usage_with_options(annotate_usage, options);
> >
> >       ret = __cmd_annotate(&annotate);
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index e2e257bcc461..324368aabfa2 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -195,12 +195,14 @@ static struct hist_entry_ops c2c_entry_ops = {
> >
> >  static int c2c_hists__init(struct c2c_hists *hists,
> >                          const char *sort,
> > -                        int nr_header_lines);
> > +                        int nr_header_lines,
> > +                        struct perf_env *env);
> >
> >  static struct c2c_hists*
> >  he__get_c2c_hists(struct hist_entry *he,
> >                 const char *sort,
> > -               int nr_header_lines)
> > +               int nr_header_lines,
> > +               struct perf_env *env)
> >  {
> >       struct c2c_hist_entry *c2c_he;
> >       struct c2c_hists *hists;
> > @@ -214,7 +216,7 @@ he__get_c2c_hists(struct hist_entry *he,
> >       if (!hists)
> >               return NULL;
> >
> > -     ret = c2c_hists__init(hists, sort, nr_header_lines);
> > +     ret = c2c_hists__init(hists, sort, nr_header_lines, env);
> >       if (ret) {
> >               free(hists);
> >               return NULL;
> > @@ -350,7 +352,7 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
> >
> >               mi = mi_dup;
> >
> > -             c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2);
> > +             c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2, machine->env);
> >               if (!c2c_hists)
> >                       goto free_mi;
> >
> > @@ -1966,7 +1968,8 @@ static struct c2c_fmt *get_format(const char *name)
> >       return c2c_fmt;
> >  }
> >
> > -static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
> > +static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name,
> > +                               struct perf_env *env __maybe_unused)
> >  {
> >       struct c2c_fmt *c2c_fmt = get_format(name);
> >       int level = 0;
> > @@ -1980,14 +1983,14 @@ static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
> >       return 0;
> >  }
> >
> > -static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
> > +static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name, struct perf_env *env)
> >  {
> >       struct c2c_fmt *c2c_fmt = get_format(name);
> >       struct c2c_dimension *dim;
> >
> >       if (!c2c_fmt) {
> >               reset_dimensions();
> > -             return sort_dimension__add(hpp_list, name, NULL, 0);
> > +             return sort_dimension__add(hpp_list, name, /*evlist=*/NULL, env, /*level=*/0);
> >       }
> >
> >       dim = c2c_fmt->dim;
> > @@ -2008,7 +2011,7 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
> >                                                                               \
> >               for (tok = strtok_r((char *)_list, ", ", &tmp);                 \
> >                               tok; tok = strtok_r(NULL, ", ", &tmp)) {        \
> > -                     ret = _fn(hpp_list, tok);                               \
> > +                     ret = _fn(hpp_list, tok, env);                          \
> >                       if (ret == -EINVAL) {                                   \
> >                               pr_err("Invalid --fields key: `%s'", tok);      \
> >                               break;                                          \
> > @@ -2021,7 +2024,8 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
> >
> >  static int hpp_list__parse(struct perf_hpp_list *hpp_list,
> >                          const char *output_,
> > -                        const char *sort_)
> > +                        const char *sort_,
> > +                        struct perf_env *env)
> >  {
> >       char *output = output_ ? strdup(output_) : NULL;
> >       char *sort   = sort_   ? strdup(sort_) : NULL;
> > @@ -2052,7 +2056,8 @@ static int hpp_list__parse(struct perf_hpp_list *hpp_list,
> >
> >  static int c2c_hists__init(struct c2c_hists *hists,
> >                          const char *sort,
> > -                        int nr_header_lines)
> > +                        int nr_header_lines,
> > +                        struct perf_env *env)
> >  {
> >       __hists__init(&hists->hists, &hists->list);
> >
> > @@ -2066,15 +2071,16 @@ static int c2c_hists__init(struct c2c_hists *hists,
> >       /* Overload number of header lines.*/
> >       hists->list.nr_header_lines = nr_header_lines;
> >
> > -     return hpp_list__parse(&hists->list, NULL, sort);
> > +     return hpp_list__parse(&hists->list, /*output=*/NULL, sort, env);
> >  }
> >
> >  static int c2c_hists__reinit(struct c2c_hists *c2c_hists,
> >                            const char *output,
> > -                          const char *sort)
> > +                          const char *sort,
> > +                          struct perf_env *env)
> >  {
> >       perf_hpp__reset_output_field(&c2c_hists->list);
> > -     return hpp_list__parse(&c2c_hists->list, output, sort);
> > +     return hpp_list__parse(&c2c_hists->list, output, sort, env);
> >  }
> >
> >  #define DISPLAY_LINE_LIMIT  0.001
> > @@ -2207,8 +2213,9 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
> >       return 0;
> >  }
> >
> > -static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
> > +static int resort_cl_cb(struct hist_entry *he, void *arg)
> >  {
> > +     struct perf_env *env = arg;
> >       struct c2c_hist_entry *c2c_he;
> >       struct c2c_hists *c2c_hists;
> >       bool display = he__display(he, &c2c.shared_clines_stats);
> > @@ -2222,7 +2229,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
> >               c2c_he->cacheline_idx = idx++;
> >               calc_width(c2c_he);
> >
> > -             c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort);
> > +             c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort, env);
> >
> >               hists__collapse_resort(&c2c_hists->hists, NULL);
> >               hists__output_resort_cb(&c2c_hists->hists, NULL, filter_cb);
> > @@ -2333,7 +2340,7 @@ static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
> >       return 0;
> >  }
> >
> > -static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
> > +static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb, void *arg)
> >  {
> >       struct rb_node *next = rb_first_cached(&hists->entries);
> >       int ret = 0;
> > @@ -2342,7 +2349,7 @@ static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
> >               struct hist_entry *he;
> >
> >               he = rb_entry(next, struct hist_entry, rb_node);
> > -             ret = cb(he, NULL);
> > +             ret = cb(he, arg);
> >               if (ret)
> >                       break;
> >               next = rb_next(&he->rb_node);
> > @@ -2448,7 +2455,7 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
> >       hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
> >  }
> >
> > -static void print_pareto(FILE *out)
> > +static void print_pareto(FILE *out, struct perf_env *env)
> >  {
> >       struct perf_hpp_list hpp_list;
> >       struct rb_node *nd;
> > @@ -2473,7 +2480,7 @@ static void print_pareto(FILE *out)
> >                           "dcacheline";
> >
> >       perf_hpp_list__init(&hpp_list);
> > -     ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> > +     ret = hpp_list__parse(&hpp_list, cl_output, /*evlist=*/NULL, env);
> >
> >       if (WARN_ONCE(ret, "failed to setup sort entries\n"))
> >               return;
> > @@ -2538,7 +2545,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
> >       fprintf(out, "=================================================\n");
> >       fprintf(out, "#\n");
> >
> > -     print_pareto(out);
> > +     print_pareto(out, &session->header.env);
> >  }
> >
> >  #ifdef HAVE_SLANG_SUPPORT
> > @@ -3095,7 +3102,7 @@ static int perf_c2c__report(int argc, const char **argv)
> >               goto out_session;
> >       }
> >
> > -     err = c2c_hists__init(&c2c.hists, "dcacheline", 2);
> > +     err = c2c_hists__init(&c2c.hists, "dcacheline", 2, &session->header.env);
> >       if (err) {
> >               pr_debug("Failed to initialize hists\n");
> >               goto out_session;
> > @@ -3179,13 +3186,13 @@ static int perf_c2c__report(int argc, const char **argv)
> >       else if (c2c.display == DISPLAY_SNP_PEER)
> >               sort_str = "tot_peer";
> >
> > -     c2c_hists__reinit(&c2c.hists, output_str, sort_str);
> > +     c2c_hists__reinit(&c2c.hists, output_str, sort_str, &session->header.env);
> >
> >       ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
> >
> >       hists__collapse_resort(&c2c.hists.hists, NULL);
> >       hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
> > -     hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
> > +     hists__iterate_cb(&c2c.hists.hists, resort_cl_cb, &session->header.env);
> >
> >       ui_progress__finish();
> >
> > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> > index ae490d58af92..f6735cf02329 100644
> > --- a/tools/perf/builtin-diff.c
> > +++ b/tools/perf/builtin-diff.c
> > @@ -2003,7 +2003,7 @@ int cmd_diff(int argc, const char **argv)
> >               sort__mode = SORT_MODE__DIFF;
> >       }
> >
> > -     if (setup_sorting(NULL) < 0)
> > +     if (setup_sorting(/*evlist=*/NULL, &data__files[0].session->header.env) < 0)
> >               usage_with_options(diff_usage, options);
> >
> >       setup_pager();
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index f0299c7ee025..8be6e7c3bd22 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -1776,7 +1776,7 @@ int cmd_report(int argc, const char **argv)
> >       }
> >
> >       if ((last_key != K_SWITCH_INPUT_DATA && last_key != K_RELOAD) &&
> > -         (setup_sorting(session->evlist) < 0)) {
> > +         (setup_sorting(session->evlist, &session->header.env) < 0)) {
> >               if (sort_order)
> >                       parse_options_usage(report_usage, options, "s", 1);
> >               if (field_order)
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 7b6cde87d2af..13ef0d188a96 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
> >
> >       setup_browser(false);
> >
> > -     if (setup_sorting(top.evlist) < 0) {
> > +     top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
> > +     if (IS_ERR(top.session)) {
> > +             status = PTR_ERR(top.session);
> > +             top.session = NULL;
> > +             goto out_delete_evlist;
> > +     }
> > +
> > +     if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
>
> I doubt a valide env can be got in perf_session__new(), since there is
> no perf.data in perf top.
> Maybe just need to invoke the perf_env__raw_arch() instead to fill the
> env->arch.

I think the current code is making things harder than it should be, we
should work away from perf_env__arch and strings, instead using EM_
values which we can default to EM_HOST avoiding any runtime costs.
Looking at perf_env__arch:
```
const char *perf_env__arch(struct perf_env *env)
{
        char *arch_name;

        if (!env || !env->arch) { /* Assume local operation */
                static struct utsname uts = { .machine[0] = '\0', };
                if (uts.machine[0] == '\0' && uname(&uts) < 0)
                        return NULL;
                arch_name = uts.machine;
        } else
                arch_name = env->arch;

        return normalize_arch(arch_name);
}
```
in this case env->arch == NULL and so the uname machine will be used.
For perf_env__raw_arch the behavior is similar but it populates the
env:
```
static int perf_env__read_arch(struct perf_env *env)
{
        struct utsname uts;

        if (env->arch)
                return 0;

        if (!uname(&uts))
                env->arch = strdup(uts.machine);

        return env->arch ? 0 : -ENOMEM;
}

const char *perf_env__raw_arch(struct perf_env *env)
{
        return env && !perf_env__read_arch(env) ? env->arch : "unknown";
}
```
Aside from caching the arch, the main difference is that
normalize_arch isn't called. Not having normalize_arch means the code
in arch_support_sort_key and arch_perf_header_entry would need to
handle strings "ppc" as well as "powerpc", "i386" as well as "x86",
etc. As I'd prefer not handle all those cases I think the way the code
is is best given how the env code is currently structured.

Thanks,
Ian

> Thanks,
> Kan
>
> >               if (sort_order)
> >                       parse_options_usage(top_usage, options, "s", 1);
> >               if (field_order)
> > @@ -1820,13 +1827,6 @@ int cmd_top(int argc, const char **argv)
> >               signal(SIGWINCH, winch_sig);
> >       }
> >
> > -     top.session = perf_session__new(NULL, NULL);
> > -     if (IS_ERR(top.session)) {
> > -             status = PTR_ERR(top.session);
> > -             top.session = NULL;
> > -             goto out_delete_evlist;
> > -     }
> > -
> >       if (!evlist__needs_bpf_sb_event(top.evlist))
> >               top.record_opts.no_bpf_event = true;
> >
> > diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
> > index 1e0f5a310fd5..3eb9ef8d7ec6 100644
> > --- a/tools/perf/tests/hists_cumulate.c
> > +++ b/tools/perf/tests/hists_cumulate.c
> > @@ -295,7 +295,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
> >       symbol_conf.cumulate_callchain = false;
> >       evsel__reset_sample_bit(evsel, CALLCHAIN);
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >       callchain_register_param(&callchain_param);
> >
> >       err = add_hist_entries(hists, machine);
> > @@ -442,7 +442,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
> >       symbol_conf.cumulate_callchain = false;
> >       evsel__set_sample_bit(evsel, CALLCHAIN);
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >       callchain_register_param(&callchain_param);
> >
> >       err = add_hist_entries(hists, machine);
> > @@ -500,7 +500,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
> >       symbol_conf.cumulate_callchain = true;
> >       evsel__reset_sample_bit(evsel, CALLCHAIN);
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >       callchain_register_param(&callchain_param);
> >
> >       err = add_hist_entries(hists, machine);
> > @@ -684,7 +684,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
> >       symbol_conf.cumulate_callchain = true;
> >       evsel__set_sample_bit(evsel, CALLCHAIN);
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >
> >       callchain_param = callchain_param_default;
> >       callchain_register_param(&callchain_param);
> > diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
> > index 4b2e4f2fbe48..1cebd20cc91c 100644
> > --- a/tools/perf/tests/hists_filter.c
> > +++ b/tools/perf/tests/hists_filter.c
> > @@ -131,10 +131,6 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
> >               goto out;
> >       err = TEST_FAIL;
> >
> > -     /* default sort order (comm,dso,sym) will be used */
> > -     if (setup_sorting(NULL) < 0)
> > -             goto out;
> > -
> >       machines__init(&machines);
> >
> >       /* setup threads/dso/map/symbols also */
> > @@ -145,6 +141,10 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
> >       if (verbose > 1)
> >               machine__fprintf(machine, stderr);
> >
> > +     /* default sort order (comm,dso,sym) will be used */
> > +     if (setup_sorting(evlist, machine->env) < 0)
> > +             goto out;
> > +
> >       /* process sample events */
> >       err = add_hist_entries(evlist, machine);
> >       if (err < 0)
> > diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
> > index 5b6f1e883466..996f5f0b3bd1 100644
> > --- a/tools/perf/tests/hists_link.c
> > +++ b/tools/perf/tests/hists_link.c
> > @@ -303,10 +303,6 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
> >               goto out;
> >
> >       err = TEST_FAIL;
> > -     /* default sort order (comm,dso,sym) will be used */
> > -     if (setup_sorting(NULL) < 0)
> > -             goto out;
> > -
> >       machines__init(&machines);
> >
> >       /* setup threads/dso/map/symbols also */
> > @@ -317,6 +313,10 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
> >       if (verbose > 1)
> >               machine__fprintf(machine, stderr);
> >
> > +     /* default sort order (comm,dso,sym) will be used */
> > +     if (setup_sorting(evlist, machine->env) < 0)
> > +             goto out;
> > +
> >       /* process sample events */
> >       err = add_hist_entries(evlist, machine);
> >       if (err < 0)
> > diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
> > index 33b5cc8352a7..ee5ec8bda60e 100644
> > --- a/tools/perf/tests/hists_output.c
> > +++ b/tools/perf/tests/hists_output.c
> > @@ -146,7 +146,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
> >       field_order = NULL;
> >       sort_order = NULL; /* equivalent to sort_order = "comm,dso,sym" */
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >
> >       /*
> >        * expected output:
> > @@ -248,7 +248,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
> >       field_order = "overhead,cpu";
> >       sort_order = "pid";
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >
> >       /*
> >        * expected output:
> > @@ -304,7 +304,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
> >       field_order = "comm,overhead,dso";
> >       sort_order = NULL;
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >
> >       /*
> >        * expected output:
> > @@ -378,7 +378,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
> >       field_order = "dso,sym,comm,overhead,dso";
> >       sort_order = "sym";
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >
> >       /*
> >        * expected output:
> > @@ -480,7 +480,7 @@ static int test5(struct evsel *evsel, struct machine *machine)
> >       field_order = "cpu,pid,comm,dso,sym";
> >       sort_order = "dso,pid";
> >
> > -     setup_sorting(NULL);
> > +     setup_sorting(/*evlist=*/NULL, machine->env);
> >
> >       /*
> >        * expected output:
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index 119bce37f4fd..3b97a31736c5 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -390,9 +390,6 @@ extern unsigned int proc_map_timeout;
> >  #define PAGE_SIZE_NAME_LEN   32
> >  char *get_page_size_name(u64 size, char *str);
> >
> > -const char *arch_perf_header_entry(const char *se_header);
> > -int arch_support_sort_key(const char *sort_key);
> > -
> >  static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
> >  {
> >       return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > index dda4ef0b5a73..ccab10cd24a5 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -2526,19 +2526,44 @@ struct sort_dimension {
> >       int                     taken;
> >  };
> >
> > -int __weak arch_support_sort_key(const char *sort_key __maybe_unused)
> > +static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
> >  {
> > +     const char *arch = perf_env__arch(env);
> > +
> > +     if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
> > +             if (!strcmp(sort_key, "p_stage_cyc"))
> > +                     return 1;
> > +             if (!strcmp(sort_key, "local_p_stage_cyc"))
> > +                     return 1;
> > +     }
> >       return 0;
> >  }
> >
> > -const char * __weak arch_perf_header_entry(const char *se_header)
> > -{
> > +static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
> > +{
> > +     const char *arch = perf_env__arch(env);
> > +
> > +     if (!strcmp("x86", arch)) {
> > +             if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> > +                     return "Local Retire Latency";
> > +             else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> > +                     return "Retire Latency";
> > +     } else if (!strcmp("powerpc", arch)) {
> > +             if (!strcmp(se_header, "Local INSTR Latency"))
> > +                     return "Finish Cyc";
> > +             else if (!strcmp(se_header, "INSTR Latency"))
> > +                     return "Global Finish_cyc";
> > +             else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> > +                     return "Dispatch Cyc";
> > +             else if (!strcmp(se_header, "Pipeline Stage Cycle"))
> > +                     return "Global Dispatch_cyc";
> > +     }
> >       return se_header;
> >  }
> >
> > -static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
> > +static void sort_dimension_add_dynamic_header(struct sort_dimension *sd, struct perf_env *env)
> >  {
> > -     sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header);
> > +     sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header, env);
> >  }
> >
> >  #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
> > @@ -3590,7 +3615,7 @@ int hpp_dimension__add_output(unsigned col, bool implicit)
> >  }
> >
> >  int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> > -                     struct evlist *evlist,
> > +                     struct evlist *evlist, struct perf_env *env,
> >                       int level)
> >  {
> >       unsigned int i, j;
> > @@ -3603,7 +3628,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> >        */
> >       for (j = 0; j < ARRAY_SIZE(arch_specific_sort_keys); j++) {
> >               if (!strcmp(arch_specific_sort_keys[j], tok) &&
> > -                             !arch_support_sort_key(tok)) {
> > +                 !arch_support_sort_key(tok, env)) {
> >                       return 0;
> >               }
> >       }
> > @@ -3616,7 +3641,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> >
> >               for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
> >                       if (sd->name && !strcmp(dynamic_headers[j], sd->name))
> > -                             sort_dimension_add_dynamic_header(sd);
> > +                             sort_dimension_add_dynamic_header(sd, env);
> >               }
> >
> >               if (sd->entry == &sort_parent && parent_pattern) {
> > @@ -3712,13 +3737,13 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> >  }
> >
> >  /* This should match with sort_dimension__add() above */
> > -static bool is_hpp_sort_key(const char *key)
> > +static bool is_hpp_sort_key(const char *key, struct perf_env *env)
> >  {
> >       unsigned i;
> >
> >       for (i = 0; i < ARRAY_SIZE(arch_specific_sort_keys); i++) {
> >               if (!strcmp(arch_specific_sort_keys[i], key) &&
> > -                 !arch_support_sort_key(key)) {
> > +                 !arch_support_sort_key(key, env)) {
> >                       return false;
> >               }
> >       }
> > @@ -3740,7 +3765,7 @@ static bool is_hpp_sort_key(const char *key)
> >  }
> >
> >  static int setup_sort_list(struct perf_hpp_list *list, char *str,
> > -                        struct evlist *evlist)
> > +                        struct evlist *evlist, struct perf_env *env)
> >  {
> >       char *tmp, *tok;
> >       int ret = 0;
> > @@ -3769,7 +3794,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
> >               }
> >
> >               if (*tok) {
> > -                     if (is_hpp_sort_key(tok)) {
> > +                     if (is_hpp_sort_key(tok, env)) {
> >                               /* keep output (hpp) sort keys in the same level */
> >                               if (prev_was_hpp) {
> >                                       bool next_same = (level == next_level);
> > @@ -3782,7 +3807,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
> >                               prev_was_hpp = false;
> >                       }
> >
> > -                     ret = sort_dimension__add(list, tok, evlist, level);
> > +                     ret = sort_dimension__add(list, tok, evlist, env, level);
> >                       if (ret == -EINVAL) {
> >                               if (!cacheline_size() && !strncasecmp(tok, "dcacheline", strlen(tok)))
> >                                       ui__error("The \"dcacheline\" --sort key needs to know the cacheline size and it couldn't be determined on this system");
> > @@ -3911,7 +3936,7 @@ static char *setup_overhead(char *keys)
> >       return keys;
> >  }
> >
> > -static int __setup_sorting(struct evlist *evlist)
> > +static int __setup_sorting(struct evlist *evlist, struct perf_env *env)
> >  {
> >       char *str;
> >       const char *sort_keys;
> > @@ -3951,7 +3976,7 @@ static int __setup_sorting(struct evlist *evlist)
> >               }
> >       }
> >
> > -     ret = setup_sort_list(&perf_hpp_list, str, evlist);
> > +     ret = setup_sort_list(&perf_hpp_list, str, evlist, env);
> >
> >       free(str);
> >       return ret;
> > @@ -4187,16 +4212,16 @@ static int __setup_output_field(void)
> >       return ret;
> >  }
> >
> > -int setup_sorting(struct evlist *evlist)
> > +int setup_sorting(struct evlist *evlist, struct perf_env *env)
> >  {
> >       int err;
> >
> > -     err = __setup_sorting(evlist);
> > +     err = __setup_sorting(evlist, env);
> >       if (err < 0)
> >               return err;
> >
> >       if (parent_pattern != default_parent_pattern) {
> > -             err = sort_dimension__add(&perf_hpp_list, "parent", evlist, -1);
> > +             err = sort_dimension__add(&perf_hpp_list, "parent", evlist, env, -1);
> >               if (err < 0)
> >                       return err;
> >       }
> > diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> > index a742ab7f3c67..d7787958e06b 100644
> > --- a/tools/perf/util/sort.h
> > +++ b/tools/perf/util/sort.h
> > @@ -6,6 +6,7 @@
> >  #include "hist.h"
> >
> >  struct option;
> > +struct perf_env;
> >
> >  extern regex_t parent_regex;
> >  extern const char *sort_order;
> > @@ -130,7 +131,7 @@ extern struct sort_entry sort_thread;
> >
> >  struct evlist;
> >  struct tep_handle;
> > -int setup_sorting(struct evlist *evlist);
> > +int setup_sorting(struct evlist *evlist, struct perf_env *env);
> >  int setup_output_field(void);
> >  void reset_output_field(void);
> >  void sort__setup_elide(FILE *fp);
> > @@ -145,7 +146,7 @@ bool is_strict_order(const char *order);
> >  int hpp_dimension__add_output(unsigned col, bool implicit);
> >  void reset_dimensions(void);
> >  int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> > -                     struct evlist *evlist,
> > +                     struct evlist *evlist, struct perf_env *env,
> >                       int level);
> >  int output_field_add(struct perf_hpp_list *list, const char *tok, int *level);
> >  int64_t
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 16:16     ` Ian Rogers
@ 2025-05-21 18:13       ` Liang, Kan
  2025-05-21 19:19         ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2025-05-21 18:13 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users



On 2025-05-21 12:16 p.m., Ian Rogers wrote:
>>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
>>> index 7b6cde87d2af..13ef0d188a96 100644
>>> --- a/tools/perf/builtin-top.c
>>> +++ b/tools/perf/builtin-top.c
>>> @@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
>>>
>>>       setup_browser(false);
>>>
>>> -     if (setup_sorting(top.evlist) < 0) {
>>> +     top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
>>> +     if (IS_ERR(top.session)) {
>>> +             status = PTR_ERR(top.session);
>>> +             top.session = NULL;
>>> +             goto out_delete_evlist;
>>> +     }
>>> +
>>> +     if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
>> I doubt a valide env can be got in perf_session__new(), since there is
>> no perf.data in perf top.
>> Maybe just need to invoke the perf_env__raw_arch() instead to fill the
>> env->arch.
> I think the current code is making things harder than it should be, we
> should work away from perf_env__arch and strings, instead using EM_
> values which we can default to EM_HOST avoiding any runtime costs.
> Looking at perf_env__arch:
> ```
> const char *perf_env__arch(struct perf_env *env)
> {
>         char *arch_name;
> 
>         if (!env || !env->arch) { /* Assume local operation */
>                 static struct utsname uts = { .machine[0] = '\0', };
>                 if (uts.machine[0] == '\0' && uname(&uts) < 0)
>                         return NULL;
>                 arch_name = uts.machine;
>         } else
>                 arch_name = env->arch;
> 
>         return normalize_arch(arch_name);
> }
> ```
> in this case env->arch == NULL and so the uname machine will be used.
> For perf_env__raw_arch the behavior is similar but it populates the
> env:
> ```
> static int perf_env__read_arch(struct perf_env *env)
> {
>         struct utsname uts;
> 
>         if (env->arch)
>                 return 0;
> 
>         if (!uname(&uts))
>                 env->arch = strdup(uts.machine);
> 
>         return env->arch ? 0 : -ENOMEM;
> }
> 
> const char *perf_env__raw_arch(struct perf_env *env)
> {
>         return env && !perf_env__read_arch(env) ? env->arch : "unknown";
> }
> ```
> Aside from caching the arch, the main difference is that
> normalize_arch isn't called. Not having normalize_arch means the code
> in arch_support_sort_key and arch_perf_header_entry would need to
> handle strings "ppc" as well as "powerpc", "i386" as well as "x86",
> etc. As I'd prefer not handle all those cases I think the way the code
> is is best given how the env code is currently structured.

Right. The perf_env__raw_arch() doesn't improve anything.
But I still don't like &top.session->header.env.
Because I don't think you can get any useful information from
top.session->header.env. It just brings confusions. (It seems an env is
retrieved, but it is actually not.)

In the perf top, &perf_env is used for the existing cases. If any env
fields are not available, perf_env__read_XXX() is invoked to get the
information.
I think we may follow the existing usage, e.g.,
setup_sorting(top.evlist, &perf_env).

Alternatively, it looks like the perf top doesn't support --weight. The
env->arch should never be used. If so, a NULL can be used as well, E.g.,
setup_sorting(top.evlist, NULL).

Thanks,
Kan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 18:13       ` Liang, Kan
@ 2025-05-21 19:19         ` Ian Rogers
  2025-05-23 14:50           ` Liang, Kan
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 19:19 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users

On Wed, May 21, 2025 at 11:14 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-05-21 12:16 p.m., Ian Rogers wrote:
> >>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> >>> index 7b6cde87d2af..13ef0d188a96 100644
> >>> --- a/tools/perf/builtin-top.c
> >>> +++ b/tools/perf/builtin-top.c
> >>> @@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
> >>>
> >>>       setup_browser(false);
> >>>
> >>> -     if (setup_sorting(top.evlist) < 0) {
> >>> +     top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
> >>> +     if (IS_ERR(top.session)) {
> >>> +             status = PTR_ERR(top.session);
> >>> +             top.session = NULL;
> >>> +             goto out_delete_evlist;
> >>> +     }
> >>> +
> >>> +     if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
> >> I doubt a valide env can be got in perf_session__new(), since there is
> >> no perf.data in perf top.
> >> Maybe just need to invoke the perf_env__raw_arch() instead to fill the
> >> env->arch.
> > I think the current code is making things harder than it should be, we
> > should work away from perf_env__arch and strings, instead using EM_
> > values which we can default to EM_HOST avoiding any runtime costs.
> > Looking at perf_env__arch:
> > ```
> > const char *perf_env__arch(struct perf_env *env)
> > {
> >         char *arch_name;
> >
> >         if (!env || !env->arch) { /* Assume local operation */
> >                 static struct utsname uts = { .machine[0] = '\0', };
> >                 if (uts.machine[0] == '\0' && uname(&uts) < 0)
> >                         return NULL;
> >                 arch_name = uts.machine;
> >         } else
> >                 arch_name = env->arch;
> >
> >         return normalize_arch(arch_name);
> > }
> > ```
> > in this case env->arch == NULL and so the uname machine will be used.
> > For perf_env__raw_arch the behavior is similar but it populates the
> > env:
> > ```
> > static int perf_env__read_arch(struct perf_env *env)
> > {
> >         struct utsname uts;
> >
> >         if (env->arch)
> >                 return 0;
> >
> >         if (!uname(&uts))
> >                 env->arch = strdup(uts.machine);
> >
> >         return env->arch ? 0 : -ENOMEM;
> > }
> >
> > const char *perf_env__raw_arch(struct perf_env *env)
> > {
> >         return env && !perf_env__read_arch(env) ? env->arch : "unknown";
> > }
> > ```
> > Aside from caching the arch, the main difference is that
> > normalize_arch isn't called. Not having normalize_arch means the code
> > in arch_support_sort_key and arch_perf_header_entry would need to
> > handle strings "ppc" as well as "powerpc", "i386" as well as "x86",
> > etc. As I'd prefer not handle all those cases I think the way the code
> > is is best given how the env code is currently structured.
>
> Right. The perf_env__raw_arch() doesn't improve anything.
> But I still don't like &top.session->header.env.
> Because I don't think you can get any useful information from
> top.session->header.env. It just brings confusions. (It seems an env is
> retrieved, but it is actually not.)

Well there's a certain consistency in using the session env to set up
the sorting, etc. This pre-exists this change with nearly every
builtin-* file doing `symbol__init(&session->header.env);`. perf top
does `symbol__init(NULL);`:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-top.c?h=perf-tools-next#n1811
but the code now has lazy initialization patterns and handling NULL as
a special case of meaning host machine:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/symbol.c?h=perf-tools-next#n2350

> In the perf top, &perf_env is used for the existing cases. If any env
> fields are not available, perf_env__read_XXX() is invoked to get the
> information.
> I think we may follow the existing usage, e.g.,
> setup_sorting(top.evlist, &perf_env).

So using the global perf_env rather than NULL feels preferable but I
think the global perf_env should be deleted. Whenever I see the global
perf_env in use I think the code has a bug as the perf_env should be
coming from the session or the machine. The global perf_env can have
no meaning for cases like `perf diff` where more than one
file/header/env is open at a time. The global perf_env variable's
existence encourages bad or broken code, so deleting it should avoid
errors in code. Another place these issues can occur is with TPEBS
where we're maintaining multiple sessions for sampling alongside
counting.

> Alternatively, it looks like the perf top doesn't support --weight. The
> env->arch should never be used. If so, a NULL can be used as well, E.g.,
> setup_sorting(top.evlist, NULL).

So I don't like NULL because then we have lazy initialization of host
data and NULL meaning use host. I don't like the global perf_env
variable as it has a code smell about it. I think moving the session
initialization earlier in perf top so its env, although unpopulated,
can be used is consistent with `perf report` - this is consistent with
`perf top` being `perf record` glued together with `perf report`. So I
think the change here is the smallest and most sensible.

Longer term let's delete the global perf_env variable,  perf_env__arch
should be switched to a perf_env__e_machine as then we can avoid uname
calls just to determine the machine architecture, etc.

Thanks,
Ian

> Thanks,
> Kan
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 19:19         ` Ian Rogers
@ 2025-05-23 14:50           ` Liang, Kan
  2025-05-23 15:51             ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2025-05-23 14:50 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users



On 2025-05-21 3:19 p.m., Ian Rogers wrote:
> On Wed, May 21, 2025 at 11:14 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2025-05-21 12:16 p.m., Ian Rogers wrote:
>>>>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
>>>>> index 7b6cde87d2af..13ef0d188a96 100644
>>>>> --- a/tools/perf/builtin-top.c
>>>>> +++ b/tools/perf/builtin-top.c
>>>>> @@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
>>>>>
>>>>>       setup_browser(false);
>>>>>
>>>>> -     if (setup_sorting(top.evlist) < 0) {
>>>>> +     top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
>>>>> +     if (IS_ERR(top.session)) {
>>>>> +             status = PTR_ERR(top.session);
>>>>> +             top.session = NULL;
>>>>> +             goto out_delete_evlist;
>>>>> +     }
>>>>> +
>>>>> +     if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
>>>> I doubt a valide env can be got in perf_session__new(), since there is
>>>> no perf.data in perf top.
>>>> Maybe just need to invoke the perf_env__raw_arch() instead to fill the
>>>> env->arch.
>>> I think the current code is making things harder than it should be, we
>>> should work away from perf_env__arch and strings, instead using EM_
>>> values which we can default to EM_HOST avoiding any runtime costs.
>>> Looking at perf_env__arch:
>>> ```
>>> const char *perf_env__arch(struct perf_env *env)
>>> {
>>>         char *arch_name;
>>>
>>>         if (!env || !env->arch) { /* Assume local operation */
>>>                 static struct utsname uts = { .machine[0] = '\0', };
>>>                 if (uts.machine[0] == '\0' && uname(&uts) < 0)
>>>                         return NULL;
>>>                 arch_name = uts.machine;
>>>         } else
>>>                 arch_name = env->arch;
>>>
>>>         return normalize_arch(arch_name);
>>> }
>>> ```
>>> in this case env->arch == NULL and so the uname machine will be used.
>>> For perf_env__raw_arch the behavior is similar but it populates the
>>> env:
>>> ```
>>> static int perf_env__read_arch(struct perf_env *env)
>>> {
>>>         struct utsname uts;
>>>
>>>         if (env->arch)
>>>                 return 0;
>>>
>>>         if (!uname(&uts))
>>>                 env->arch = strdup(uts.machine);
>>>
>>>         return env->arch ? 0 : -ENOMEM;
>>> }
>>>
>>> const char *perf_env__raw_arch(struct perf_env *env)
>>> {
>>>         return env && !perf_env__read_arch(env) ? env->arch : "unknown";
>>> }
>>> ```
>>> Aside from caching the arch, the main difference is that
>>> normalize_arch isn't called. Not having normalize_arch means the code
>>> in arch_support_sort_key and arch_perf_header_entry would need to
>>> handle strings "ppc" as well as "powerpc", "i386" as well as "x86",
>>> etc. As I'd prefer not handle all those cases I think the way the code
>>> is is best given how the env code is currently structured.
>>
>> Right. The perf_env__raw_arch() doesn't improve anything.
>> But I still don't like &top.session->header.env.
>> Because I don't think you can get any useful information from
>> top.session->header.env. It just brings confusions. (It seems an env is
>> retrieved, but it is actually not.)
> 
> Well there's a certain consistency in using the session env to set up
> the sorting, etc. This pre-exists this change with nearly every
> builtin-* file doing `symbol__init(&session->header.env);`. perf top
> does `symbol__init(NULL);`:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-top.c?h=perf-tools-next#n1811
> but the code now has lazy initialization patterns and handling NULL as
> a special case of meaning host machine:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/symbol.c?h=perf-tools-next#n2350
> 
>> In the perf top, &perf_env is used for the existing cases. If any env
>> fields are not available, perf_env__read_XXX() is invoked to get the
>> information.
>> I think we may follow the existing usage, e.g.,
>> setup_sorting(top.evlist, &perf_env).
> 
> So using the global perf_env rather than NULL feels preferable but I
> think the global perf_env should be deleted. Whenever I see the global
> perf_env in use I think the code has a bug as the perf_env should be
> coming from the session or the machine. The global perf_env can have
> no meaning for cases like `perf diff` where more than one
> file/header/env is open at a time. The global perf_env variable's
> existence encourages bad or broken code, so deleting it should avoid
> errors in code. Another place these issues can occur is with TPEBS
> where we're maintaining multiple sessions for sampling alongside
> counting.
> 
>> Alternatively, it looks like the perf top doesn't support --weight. The
>> env->arch should never be used. If so, a NULL can be used as well, E.g.,
>> setup_sorting(top.evlist, NULL).
> 
> So I don't like NULL because then we have lazy initialization of host
> data and NULL meaning use host. I don't like the global perf_env
> variable as it has a code smell about it. I think moving the session
> initialization earlier in perf top so its env, although unpopulated,
> can be used is consistent with `perf report` - this is consistent with
> `perf top` being `perf record` glued together with `perf report`. So I
> think the change here is the smallest and most sensible.
> 
> Longer term let's delete the global perf_env variable,  perf_env__arch
> should be switched to a perf_env__e_machine as then we can avoid uname
> calls just to determine the machine architecture, etc.
> 

I'm fine with the session's env, as long as there is a consistent env
source in the perf top. Because in the recent perf top fixes, we
randomly pick the env source. Thomas's patch used the global env, but
this one chose the session's env. It brings confusions.
https://lore.kernel.org/lkml/20250513231813.13846-2-thomas.falcon@intel.com/

Could you please send a clean up patch to address the inconsistency?

Thanks,
Kan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-23 14:50           ` Liang, Kan
@ 2025-05-23 15:51             ` Ian Rogers
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-23 15:51 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Masami Hiramatsu (Google),
	Ravi Bangoria, Leo Yan, Yujie Liu, Graham Woodward, Howard Chu,
	Weilin Wang, Dmitry Vyukov, Andi Kleen, Thomas Falcon,
	Matt Fleming, Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users

On Fri, May 23, 2025 at 7:51 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
> On 2025-05-21 3:19 p.m., Ian Rogers wrote:
> > On Wed, May 21, 2025 at 11:14 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >> On 2025-05-21 12:16 p.m., Ian Rogers wrote:
> >>>>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> >>>>> index 7b6cde87d2af..13ef0d188a96 100644
> >>>>> --- a/tools/perf/builtin-top.c
> >>>>> +++ b/tools/perf/builtin-top.c
> >>>>> @@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
> >>>>>
> >>>>>       setup_browser(false);
> >>>>>
> >>>>> -     if (setup_sorting(top.evlist) < 0) {
> >>>>> +     top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
> >>>>> +     if (IS_ERR(top.session)) {
> >>>>> +             status = PTR_ERR(top.session);
> >>>>> +             top.session = NULL;
> >>>>> +             goto out_delete_evlist;
> >>>>> +     }
> >>>>> +
> >>>>> +     if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
> >>>> I doubt a valide env can be got in perf_session__new(), since there is
> >>>> no perf.data in perf top.
> >>>> Maybe just need to invoke the perf_env__raw_arch() instead to fill the
> >>>> env->arch.
> >>> I think the current code is making things harder than it should be, we
> >>> should work away from perf_env__arch and strings, instead using EM_
> >>> values which we can default to EM_HOST avoiding any runtime costs.
> >>> Looking at perf_env__arch:
> >>> ```
> >>> const char *perf_env__arch(struct perf_env *env)
> >>> {
> >>>         char *arch_name;
> >>>
> >>>         if (!env || !env->arch) { /* Assume local operation */
> >>>                 static struct utsname uts = { .machine[0] = '\0', };
> >>>                 if (uts.machine[0] == '\0' && uname(&uts) < 0)
> >>>                         return NULL;
> >>>                 arch_name = uts.machine;
> >>>         } else
> >>>                 arch_name = env->arch;
> >>>
> >>>         return normalize_arch(arch_name);
> >>> }
> >>> ```
> >>> in this case env->arch == NULL and so the uname machine will be used.
> >>> For perf_env__raw_arch the behavior is similar but it populates the
> >>> env:
> >>> ```
> >>> static int perf_env__read_arch(struct perf_env *env)
> >>> {
> >>>         struct utsname uts;
> >>>
> >>>         if (env->arch)
> >>>                 return 0;
> >>>
> >>>         if (!uname(&uts))
> >>>                 env->arch = strdup(uts.machine);
> >>>
> >>>         return env->arch ? 0 : -ENOMEM;
> >>> }
> >>>
> >>> const char *perf_env__raw_arch(struct perf_env *env)
> >>> {
> >>>         return env && !perf_env__read_arch(env) ? env->arch : "unknown";
> >>> }
> >>> ```
> >>> Aside from caching the arch, the main difference is that
> >>> normalize_arch isn't called. Not having normalize_arch means the code
> >>> in arch_support_sort_key and arch_perf_header_entry would need to
> >>> handle strings "ppc" as well as "powerpc", "i386" as well as "x86",
> >>> etc. As I'd prefer not handle all those cases I think the way the code
> >>> is is best given how the env code is currently structured.
> >>
> >> Right. The perf_env__raw_arch() doesn't improve anything.
> >> But I still don't like &top.session->header.env.
> >> Because I don't think you can get any useful information from
> >> top.session->header.env. It just brings confusions. (It seems an env is
> >> retrieved, but it is actually not.)
> >
> > Well there's a certain consistency in using the session env to set up
> > the sorting, etc. This pre-exists this change with nearly every
> > builtin-* file doing `symbol__init(&session->header.env);`. perf top
> > does `symbol__init(NULL);`:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-top.c?h=perf-tools-next#n1811
> > but the code now has lazy initialization patterns and handling NULL as
> > a special case of meaning host machine:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/symbol.c?h=perf-tools-next#n2350
> >
> >> In the perf top, &perf_env is used for the existing cases. If any env
> >> fields are not available, perf_env__read_XXX() is invoked to get the
> >> information.
> >> I think we may follow the existing usage, e.g.,
> >> setup_sorting(top.evlist, &perf_env).
> >
> > So using the global perf_env rather than NULL feels preferable but I
> > think the global perf_env should be deleted. Whenever I see the global
> > perf_env in use I think the code has a bug as the perf_env should be
> > coming from the session or the machine. The global perf_env can have
> > no meaning for cases like `perf diff` where more than one
> > file/header/env is open at a time. The global perf_env variable's
> > existence encourages bad or broken code, so deleting it should avoid
> > errors in code. Another place these issues can occur is with TPEBS
> > where we're maintaining multiple sessions for sampling alongside
> > counting.
> >
> >> Alternatively, it looks like the perf top doesn't support --weight. The
> >> env->arch should never be used. If so, a NULL can be used as well, E.g.,
> >> setup_sorting(top.evlist, NULL).
> >
> > So I don't like NULL because then we have lazy initialization of host
> > data and NULL meaning use host. I don't like the global perf_env
> > variable as it has a code smell about it. I think moving the session
> > initialization earlier in perf top so its env, although unpopulated,
> > can be used is consistent with `perf report` - this is consistent with
> > `perf top` being `perf record` glued together with `perf report`. So I
> > think the change here is the smallest and most sensible.
> >
> > Longer term let's delete the global perf_env variable,  perf_env__arch
> > should be switched to a perf_env__e_machine as then we can avoid uname
> > calls just to determine the machine architecture, etc.
> >
>
> I'm fine with the session's env, as long as there is a consistent env
> source in the perf top. Because in the recent perf top fixes, we
> randomly pick the env source. Thomas's patch used the global env, but
> this one chose the session's env. It brings confusions.
> https://lore.kernel.org/lkml/20250513231813.13846-2-thomas.falcon@intel.com/
>
> Could you please send a clean up patch to address the inconsistency?

Will do, thanks Kan!

Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-23 15:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 13:54 [PATCH v2 0/3] Generic weight struct, use env for sort key and header Ian Rogers
2025-05-21 13:54 ` [PATCH v2 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
2025-05-21 15:42   ` Liang, Kan
2025-05-21 15:57     ` Ian Rogers
2025-05-21 13:54 ` [PATCH v2 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test Ian Rogers
2025-05-21 13:54 ` [PATCH v2 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
2025-05-21 15:50   ` Liang, Kan
2025-05-21 16:16     ` Ian Rogers
2025-05-21 18:13       ` Liang, Kan
2025-05-21 19:19         ` Ian Rogers
2025-05-23 14:50           ` Liang, Kan
2025-05-23 15:51             ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).