linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Generic weight struct, use env for sort key and header
@ 2025-05-21 16:53 Ian Rogers
  2025-05-21 16:53 ` [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 16:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users, Kajol Jain,
	Athira Rajeev

The arch directory is a barrier to cross-platform development as files
and behaviors within it are inherently platform specific. Sample
parsing should be generic but the PERF_SAMPLE_WEIGHT_STRUCT handling
was only present if building for x86 or powerpc. The sort key and
headers should be specific to the session that is being executed and
not to the machine perf is being run upon. These patches clean this
and associated code up.

v3: Code reorganizations suggested by Kan Liang.
v2: Avoid changes to include/perf/perf_dlfilter.h as suggested by
    Adrian Hunter.

Ian Rogers (3):
  perf sample: Remove arch notion of sample parsing
  perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test
  perf sort: Use perf_env to set arch sort keys and header

 tools/perf/arch/powerpc/util/Build         |   1 -
 tools/perf/arch/powerpc/util/event.c       |  60 ----------
 tools/perf/arch/x86/include/arch-tests.h   |   1 -
 tools/perf/arch/x86/tests/Build            |   1 -
 tools/perf/arch/x86/tests/arch-tests.c     |   2 -
 tools/perf/arch/x86/tests/sample-parsing.c | 125 ---------------------
 tools/perf/arch/x86/util/event.c           |  46 --------
 tools/perf/builtin-annotate.c              |   2 +-
 tools/perf/builtin-c2c.c                   |  53 +++++----
 tools/perf/builtin-diff.c                  |   2 +-
 tools/perf/builtin-report.c                |   2 +-
 tools/perf/builtin-script.c                |   2 +-
 tools/perf/builtin-top.c                   |  16 +--
 tools/perf/tests/hists_cumulate.c          |   8 +-
 tools/perf/tests/hists_filter.c            |   8 +-
 tools/perf/tests/hists_link.c              |   8 +-
 tools/perf/tests/hists_output.c            |  10 +-
 tools/perf/tests/sample-parsing.c          |  14 +++
 tools/perf/util/dlfilter.c                 |   2 +-
 tools/perf/util/event.h                    |   5 -
 tools/perf/util/evsel.c                    |  17 ++-
 tools/perf/util/hist.c                     |   4 +-
 tools/perf/util/hist.h                     |   2 +-
 tools/perf/util/intel-tpebs.c              |   4 +-
 tools/perf/util/sample.h                   |   5 +-
 tools/perf/util/session.c                  |   2 +-
 tools/perf/util/sort.c                     |  67 +++++++----
 tools/perf/util/sort.h                     |   5 +-
 tools/perf/util/synthetic-events.c         |  10 +-
 29 files changed, 150 insertions(+), 334 deletions(-)
 delete mode 100644 tools/perf/arch/powerpc/util/event.c
 delete mode 100644 tools/perf/arch/x86/tests/sample-parsing.c

-- 
2.49.0.1143.g0be31eac6b-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 16:53 [PATCH v3 0/3] Generic weight struct, use env for sort key and header Ian Rogers
@ 2025-05-21 16:53 ` Ian Rogers
  2025-05-21 20:26   ` Namhyung Kim
  2025-05-21 16:53 ` [PATCH v3 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test Ian Rogers
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 16:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users, Kajol Jain,
	Athira Rajeev

By definition arch sample parsing and synthesis will inhibit certain
kinds of cross-platform record then analysis (report, script,
etc.). Remove arch_perf_parse_sample_weight and
arch_perf_synthesize_sample_weight replacing with a common
implementation. Combine perf_sample p_stage_cyc and retire_lat to
capture the differing uses regardless of compiled for architecture.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/powerpc/util/event.c       | 26 ---------------------
 tools/perf/arch/x86/tests/sample-parsing.c |  4 ++--
 tools/perf/arch/x86/util/event.c           | 27 ----------------------
 tools/perf/builtin-script.c                |  2 +-
 tools/perf/util/dlfilter.c                 |  2 +-
 tools/perf/util/event.h                    |  2 --
 tools/perf/util/evsel.c                    | 17 ++++++++++----
 tools/perf/util/hist.c                     |  4 ++--
 tools/perf/util/hist.h                     |  2 +-
 tools/perf/util/intel-tpebs.c              |  4 ++--
 tools/perf/util/sample.h                   |  5 +---
 tools/perf/util/session.c                  |  2 +-
 tools/perf/util/sort.c                     |  6 ++---
 tools/perf/util/synthetic-events.c         | 10 ++++++--
 14 files changed, 34 insertions(+), 79 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
index 77d8cc2b5691..024ac8b54c33 100644
--- a/tools/perf/arch/powerpc/util/event.c
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -11,32 +11,6 @@
 #include "../../../util/debug.h"
 #include "../../../util/sample.h"
 
-void arch_perf_parse_sample_weight(struct perf_sample *data,
-				   const __u64 *array, u64 type)
-{
-	union perf_sample_weight weight;
-
-	weight.full = *array;
-	if (type & PERF_SAMPLE_WEIGHT)
-		data->weight = weight.full;
-	else {
-		data->weight = weight.var1_dw;
-		data->ins_lat = weight.var2_w;
-		data->p_stage_cyc = weight.var3_w;
-	}
-}
-
-void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
-					__u64 *array, u64 type)
-{
-	*array = data->weight;
-
-	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
-		*array &= 0xffffffff;
-		*array |= ((u64)data->ins_lat << 32);
-	}
-}
-
 const char *arch_perf_header_entry(const char *se_header)
 {
 	if (!strcmp(se_header, "Local INSTR Latency"))
diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
index a061e8619267..95d8f7f1d2fb 100644
--- a/tools/perf/arch/x86/tests/sample-parsing.c
+++ b/tools/perf/arch/x86/tests/sample-parsing.c
@@ -29,7 +29,7 @@ static bool samples_same(const struct perf_sample *s1,
 {
 	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
 		COMP(ins_lat);
-		COMP(retire_lat);
+		COMP(p_stage_cyc_or_retire_lat);
 	}
 
 	return true;
@@ -50,7 +50,7 @@ static int do_test(u64 sample_type)
 	struct perf_sample sample = {
 		.weight		= 101,
 		.ins_lat        = 102,
-		.retire_lat     = 103,
+		.p_stage_cyc_or_retire_lat = 103,
 	};
 	struct perf_sample sample_out;
 	size_t i, sz, bufsz;
diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index a0400707180c..576c1c36046c 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -92,33 +92,6 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
 
 #endif
 
-void arch_perf_parse_sample_weight(struct perf_sample *data,
-				   const __u64 *array, u64 type)
-{
-	union perf_sample_weight weight;
-
-	weight.full = *array;
-	if (type & PERF_SAMPLE_WEIGHT)
-		data->weight = weight.full;
-	else {
-		data->weight = weight.var1_dw;
-		data->ins_lat = weight.var2_w;
-		data->retire_lat = weight.var3_w;
-	}
-}
-
-void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
-					__u64 *array, u64 type)
-{
-	*array = data->weight;
-
-	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
-		*array &= 0xffffffff;
-		*array |= ((u64)data->ins_lat << 32);
-		*array |= ((u64)data->retire_lat << 48);
-	}
-}
-
 const char *arch_perf_header_entry(const char *se_header)
 {
 	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6c3bf74dd78c..c02c435e0f0b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2251,7 +2251,7 @@ static void process_event(struct perf_script *script,
 		fprintf(fp, "%16" PRIu16, sample->ins_lat);
 
 	if (PRINT_FIELD(RETIRE_LAT))
-		fprintf(fp, "%16" PRIu16, sample->retire_lat);
+		fprintf(fp, "%16" PRIu16, sample->p_stage_cyc_or_retire_lat);
 
 	if (PRINT_FIELD(CGROUP)) {
 		const char *cgrp_name;
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index ddacef881af2..d5fd6d34a17c 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -513,6 +513,7 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 	d->d_addr_al   = &d_addr_al;
 
 	d_sample.size  = sizeof(d_sample);
+	d_sample.p_stage_cyc = sample->p_stage_cyc_or_retire_lat;
 	d_ip_al.size   = 0; /* To indicate d_ip_al is not initialized */
 	d_addr_al.size = 0; /* To indicate d_addr_al is not initialized */
 
@@ -526,7 +527,6 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 	ASSIGN(period);
 	ASSIGN(weight);
 	ASSIGN(ins_lat);
-	ASSIGN(p_stage_cyc);
 	ASSIGN(transaction);
 	ASSIGN(insn_cnt);
 	ASSIGN(cyc_cnt);
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 664bf39567ce..119bce37f4fd 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -390,8 +390,6 @@ extern unsigned int proc_map_timeout;
 #define PAGE_SIZE_NAME_LEN	32
 char *get_page_size_name(u64 size, char *str);
 
-void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
-void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
 const char *arch_perf_header_entry(const char *se_header);
 int arch_support_sort_key(const char *sort_key);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d55482f094bf..27de167855ee 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2846,11 +2846,18 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
 	return 0;
 }
 
-void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
-					  const __u64 *array,
-					  u64 type __maybe_unused)
+static void perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type)
 {
-	data->weight = *array;
+	union perf_sample_weight weight;
+
+	weight.full = *array;
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
+		data->weight = weight.var1_dw;
+		data->ins_lat = weight.var2_w;
+		data->p_stage_cyc_or_retire_lat = weight.var3_w;
+	} else {
+		data->weight = weight.full;
+	}
 }
 
 u64 evsel__bitfield_swap_branch_flags(u64 value)
@@ -3236,7 +3243,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 
 	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
 		OVERFLOW_CHECK_u64(array);
-		arch_perf_parse_sample_weight(data, array, type);
+		perf_parse_sample_weight(data, array, type);
 		array++;
 	}
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index afc6855327ab..ae9803dca0b1 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -829,7 +829,7 @@ __hists__add_entry(struct hists *hists,
 			.period	= sample->period,
 			.weight1 = sample->weight,
 			.weight2 = sample->ins_lat,
-			.weight3 = sample->p_stage_cyc,
+			.weight3 = sample->p_stage_cyc_or_retire_lat,
 			.latency = al->latency,
 		},
 		.parent = sym_parent,
@@ -846,7 +846,7 @@ __hists__add_entry(struct hists *hists,
 		.time = hist_time(sample->time),
 		.weight = sample->weight,
 		.ins_lat = sample->ins_lat,
-		.p_stage_cyc = sample->p_stage_cyc,
+		.p_stage_cyc_or_retire_lat = sample->p_stage_cyc_or_retire_lat,
 		.simd_flags = sample->simd_flags,
 	}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index c64254088fc7..67033bdabcf4 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -255,7 +255,7 @@ struct hist_entry {
 	u64			code_page_size;
 	u64			weight;
 	u64			ins_lat;
-	u64			p_stage_cyc;
+	u64			p_stage_cyc_or_retire_lat;
 	s32			socket;
 	s32			cpu;
 	int			parallelism;
diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
index 4ad4bc118ea5..ec2f3ecf1e1c 100644
--- a/tools/perf/util/intel-tpebs.c
+++ b/tools/perf/util/intel-tpebs.c
@@ -202,8 +202,8 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
 	 * latency value will be used. Save the number of samples and the sum of
 	 * retire latency value for each event.
 	 */
-	t->last = sample->retire_lat;
-	update_stats(&t->stats, sample->retire_lat);
+	t->last = sample->p_stage_cyc_or_retire_lat;
+	update_stats(&t->stats, sample->p_stage_cyc_or_retire_lat);
 	mutex_unlock(tpebs_mtx_get());
 	return 0;
 }
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index 0e96240052e9..3330d18fb5fd 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -104,10 +104,7 @@ struct perf_sample {
 	u8  cpumode;
 	u16 misc;
 	u16 ins_lat;
-	union {
-		u16 p_stage_cyc;
-		u16 retire_lat;
-	};
+	u16 p_stage_cyc_or_retire_lat;
 	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
 	char insn[MAX_INSN];
 	void *raw_data;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index a320672c264e..451bc24ccfba 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1094,7 +1094,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
 		printf("... weight: %" PRIu64 "", sample->weight);
 			if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
 				printf(",0x%"PRIx16"", sample->ins_lat);
-				printf(",0x%"PRIx16"", sample->p_stage_cyc);
+				printf(",0x%"PRIx16"", sample->p_stage_cyc_or_retire_lat);
 			}
 		printf("\n");
 	}
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 45e654653960..dda4ef0b5a73 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1879,21 +1879,21 @@ struct sort_entry sort_global_ins_lat = {
 static int64_t
 sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
 {
-	return left->p_stage_cyc - right->p_stage_cyc;
+	return left->p_stage_cyc_or_retire_lat - right->p_stage_cyc_or_retire_lat;
 }
 
 static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
 					size_t size, unsigned int width)
 {
 	return repsep_snprintf(bf, size, "%-*u", width,
-			he->p_stage_cyc * he->stat.nr_events);
+			he->p_stage_cyc_or_retire_lat * he->stat.nr_events);
 }
 
 
 static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
 					size_t size, unsigned int width)
 {
-	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
+	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc_or_retire_lat);
 }
 
 struct sort_entry sort_local_p_stage_cyc = {
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 2fc4d0537840..449a41900fc4 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1567,10 +1567,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	return result;
 }
 
-void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
+static void perf_synthesize_sample_weight(const struct perf_sample *data,
 					       __u64 *array, u64 type __maybe_unused)
 {
 	*array = data->weight;
+
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
+		*array &= 0xffffffff;
+		*array |= ((u64)data->ins_lat << 32);
+		*array |= ((u64)data->p_stage_cyc_or_retire_lat << 48);
+	}
 }
 
 static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
@@ -1730,7 +1736,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 	}
 
 	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
-		arch_perf_synthesize_sample_weight(sample, array, type);
+		perf_synthesize_sample_weight(sample, array, type);
 		array++;
 	}
 
-- 
2.49.0.1143.g0be31eac6b-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test
  2025-05-21 16:53 [PATCH v3 0/3] Generic weight struct, use env for sort key and header Ian Rogers
  2025-05-21 16:53 ` [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
@ 2025-05-21 16:53 ` Ian Rogers
  2025-05-21 16:53 ` [PATCH v3 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
  2025-05-27 21:01 ` [PATCH v3 0/3] Generic weight struct, use env for sort key " Ian Rogers
  3 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 16:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users, Kajol Jain,
	Athira Rajeev

test__x86_sample_parsing is identical to test__sample_parsing except
it explicitly tested PERF_SAMPLE_WEIGHT_STRUCT. Now the parsing code
is common move the PERF_SAMPLE_WEIGHT_STRUCT to the common sample
parsing test and remove the x86 version.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/include/arch-tests.h   |   1 -
 tools/perf/arch/x86/tests/Build            |   1 -
 tools/perf/arch/x86/tests/arch-tests.c     |   2 -
 tools/perf/arch/x86/tests/sample-parsing.c | 125 ---------------------
 tools/perf/tests/sample-parsing.c          |  14 +++
 5 files changed, 14 insertions(+), 129 deletions(-)
 delete mode 100644 tools/perf/arch/x86/tests/sample-parsing.c

diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
index 4fd425157d7d..957934417b26 100644
--- a/tools/perf/arch/x86/include/arch-tests.h
+++ b/tools/perf/arch/x86/include/arch-tests.h
@@ -12,7 +12,6 @@ int test__insn_x86(struct test_suite *test, int subtest);
 int test__intel_pt_pkt_decoder(struct test_suite *test, int subtest);
 int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
 int test__bp_modify(struct test_suite *test, int subtest);
-int test__x86_sample_parsing(struct test_suite *test, int subtest);
 int test__amd_ibs_via_core_pmu(struct test_suite *test, int subtest);
 int test__amd_ibs_period(struct test_suite *test, int subtest);
 int test__hybrid(struct test_suite *test, int subtest);
diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
index 5e00cbfd2d56..4c27b85b960a 100644
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
@@ -2,7 +2,6 @@ perf-test-$(CONFIG_DWARF_UNWIND) += regs_load.o
 perf-test-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
 
 perf-test-y += arch-tests.o
-perf-test-y += sample-parsing.o
 perf-test-y += hybrid.o
 perf-test-$(CONFIG_AUXTRACE) += intel-pt-test.o
 ifeq ($(CONFIG_EXTRA_TESTS),y)
diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
index bfee2432515b..1023c83684ce 100644
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
@@ -23,7 +23,6 @@ struct test_suite suite__intel_pt = {
 #if defined(__x86_64__)
 DEFINE_SUITE("x86 bp modify", bp_modify);
 #endif
-DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
 DEFINE_SUITE("AMD IBS via core pmu", amd_ibs_via_core_pmu);
 DEFINE_SUITE_EXCLUSIVE("AMD IBS sample period", amd_ibs_period);
 static struct test_case hybrid_tests[] = {
@@ -49,7 +48,6 @@ struct test_suite *arch_tests[] = {
 #if defined(__x86_64__)
 	&suite__bp_modify,
 #endif
-	&suite__x86_sample_parsing,
 	&suite__amd_ibs_via_core_pmu,
 	&suite__amd_ibs_period,
 	&suite__hybrid,
diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
deleted file mode 100644
index 95d8f7f1d2fb..000000000000
--- a/tools/perf/arch/x86/tests/sample-parsing.c
+++ /dev/null
@@ -1,125 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-#include <stdbool.h>
-#include <inttypes.h>
-#include <stdlib.h>
-#include <string.h>
-#include <linux/bitops.h>
-#include <linux/kernel.h>
-#include <linux/types.h>
-
-#include "event.h"
-#include "evsel.h"
-#include "debug.h"
-#include "util/sample.h"
-#include "util/synthetic-events.h"
-
-#include "tests/tests.h"
-#include "arch-tests.h"
-
-#define COMP(m) do {					\
-	if (s1->m != s2->m) {				\
-		pr_debug("Samples differ at '"#m"'\n");	\
-		return false;				\
-	}						\
-} while (0)
-
-static bool samples_same(const struct perf_sample *s1,
-			 const struct perf_sample *s2,
-			 u64 type)
-{
-	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
-		COMP(ins_lat);
-		COMP(p_stage_cyc_or_retire_lat);
-	}
-
-	return true;
-}
-
-static int do_test(u64 sample_type)
-{
-	struct evsel evsel = {
-		.needs_swap = false,
-		.core = {
-			. attr = {
-				.sample_type = sample_type,
-				.read_format = 0,
-			},
-		},
-	};
-	union perf_event *event;
-	struct perf_sample sample = {
-		.weight		= 101,
-		.ins_lat        = 102,
-		.p_stage_cyc_or_retire_lat = 103,
-	};
-	struct perf_sample sample_out;
-	size_t i, sz, bufsz;
-	int err, ret = -1;
-
-	sz = perf_event__sample_event_size(&sample, sample_type, 0);
-	bufsz = sz + 4096; /* Add a bit for overrun checking */
-	event = malloc(bufsz);
-	if (!event) {
-		pr_debug("malloc failed\n");
-		return -1;
-	}
-
-	memset(event, 0xff, bufsz);
-	event->header.type = PERF_RECORD_SAMPLE;
-	event->header.misc = 0;
-	event->header.size = sz;
-
-	err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
-	if (err) {
-		pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
-			 "perf_event__synthesize_sample", sample_type, err);
-		goto out_free;
-	}
-
-	/* The data does not contain 0xff so we use that to check the size */
-	for (i = bufsz; i > 0; i--) {
-		if (*(i - 1 + (u8 *)event) != 0xff)
-			break;
-	}
-	if (i != sz) {
-		pr_debug("Event size mismatch: actual %zu vs expected %zu\n",
-			 i, sz);
-		goto out_free;
-	}
-
-	evsel.sample_size = __evsel__sample_size(sample_type);
-
-	err = evsel__parse_sample(&evsel, event, &sample_out);
-	if (err) {
-		pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
-			 "evsel__parse_sample", sample_type, err);
-		goto out_free;
-	}
-
-	if (!samples_same(&sample, &sample_out, sample_type)) {
-		pr_debug("parsing failed for sample_type %#"PRIx64"\n",
-			 sample_type);
-		goto out_free;
-	}
-
-	ret = 0;
-out_free:
-	free(event);
-
-	return ret;
-}
-
-/**
- * test__x86_sample_parsing - test X86 specific sample parsing
- *
- * This function implements a test that synthesizes a sample event, parses it
- * and then checks that the parsed sample matches the original sample. If the
- * test passes %0 is returned, otherwise %-1 is returned.
- *
- * For now, the PERF_SAMPLE_WEIGHT_STRUCT is the only X86 specific sample type.
- * The test only checks the PERF_SAMPLE_WEIGHT_STRUCT type.
- */
-int test__x86_sample_parsing(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
-{
-	return do_test(PERF_SAMPLE_WEIGHT_STRUCT);
-}
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 72411580f869..ad57e34bda19 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -152,6 +152,12 @@ static bool samples_same(struct perf_sample *s1,
 	if (type & PERF_SAMPLE_WEIGHT)
 		COMP(weight);
 
+	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
+		COMP(weight);
+		COMP(ins_lat);
+		COMP(p_stage_cyc_or_retire_lat);
+	}
+
 	if (type & PERF_SAMPLE_DATA_SRC)
 		COMP(data_src);
 
@@ -269,6 +275,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		.cgroup		= 114,
 		.data_page_size = 115,
 		.code_page_size = 116,
+		.ins_lat	= 117,
+		.p_stage_cyc_or_retire_lat = 118,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -439,6 +447,12 @@ static int test__sample_parsing(struct test_suite *test __maybe_unused, int subt
 		if (err)
 			return err;
 	}
+	sample_type = (PERF_SAMPLE_MAX - 1) & ~PERF_SAMPLE_WEIGHT_STRUCT;
+	for (i = 0; i < ARRAY_SIZE(rf); i++) {
+		err = do_test(sample_type, sample_regs, rf[i]);
+		if (err)
+			return err;
+	}
 
 	return 0;
 }
-- 
2.49.0.1143.g0be31eac6b-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] perf sort: Use perf_env to set arch sort keys and header
  2025-05-21 16:53 [PATCH v3 0/3] Generic weight struct, use env for sort key and header Ian Rogers
  2025-05-21 16:53 ` [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
  2025-05-21 16:53 ` [PATCH v3 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test Ian Rogers
@ 2025-05-21 16:53 ` Ian Rogers
  2025-05-27 21:01 ` [PATCH v3 0/3] Generic weight struct, use env for sort key " Ian Rogers
  3 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 16:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users, Kajol Jain,
	Athira Rajeev

Previously arch_support_sort_key and arch_perf_header_entry used a
weak symbol to compile as appropriate for x86 and powerpc. A
limitation to this is that the handling of a data file could vary in
cross-platform development. Change to using the perf_env of the
current session to determine the architecture kind and set the sort
key and header entries as appropriate.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/powerpc/util/Build   |  1 -
 tools/perf/arch/powerpc/util/event.c | 34 ----------------
 tools/perf/arch/x86/util/event.c     | 19 ---------
 tools/perf/builtin-annotate.c        |  2 +-
 tools/perf/builtin-c2c.c             | 53 +++++++++++++-----------
 tools/perf/builtin-diff.c            |  2 +-
 tools/perf/builtin-report.c          |  2 +-
 tools/perf/builtin-top.c             | 16 ++++----
 tools/perf/tests/hists_cumulate.c    |  8 ++--
 tools/perf/tests/hists_filter.c      |  8 ++--
 tools/perf/tests/hists_link.c        |  8 ++--
 tools/perf/tests/hists_output.c      | 10 ++---
 tools/perf/util/event.h              |  3 --
 tools/perf/util/sort.c               | 61 ++++++++++++++++++++--------
 tools/perf/util/sort.h               |  5 ++-
 15 files changed, 104 insertions(+), 128 deletions(-)
 delete mode 100644 tools/perf/arch/powerpc/util/event.c

diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
index ed82715080f9..fdd6a77a3432 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -5,7 +5,6 @@ perf-util-y += mem-events.o
 perf-util-y += pmu.o
 perf-util-y += sym-handling.o
 perf-util-y += evsel.o
-perf-util-y += event.o
 
 perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
 
diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
deleted file mode 100644
index 024ac8b54c33..000000000000
--- a/tools/perf/arch/powerpc/util/event.c
+++ /dev/null
@@ -1,34 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/zalloc.h>
-
-#include "../../../util/event.h"
-#include "../../../util/synthetic-events.h"
-#include "../../../util/machine.h"
-#include "../../../util/tool.h"
-#include "../../../util/map.h"
-#include "../../../util/debug.h"
-#include "../../../util/sample.h"
-
-const char *arch_perf_header_entry(const char *se_header)
-{
-	if (!strcmp(se_header, "Local INSTR Latency"))
-		return "Finish Cyc";
-	else if (!strcmp(se_header, "INSTR Latency"))
-		return "Global Finish_cyc";
-	else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
-		return "Dispatch Cyc";
-	else if (!strcmp(se_header, "Pipeline Stage Cycle"))
-		return "Global Dispatch_cyc";
-	return se_header;
-}
-
-int arch_support_sort_key(const char *sort_key)
-{
-	if (!strcmp(sort_key, "p_stage_cyc"))
-		return 1;
-	if (!strcmp(sort_key, "local_p_stage_cyc"))
-		return 1;
-	return 0;
-}
diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index 576c1c36046c..3cd384317739 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -91,22 +91,3 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
 }
 
 #endif
-
-const char *arch_perf_header_entry(const char *se_header)
-{
-	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
-		return "Local Retire Latency";
-	else if (!strcmp(se_header, "Pipeline Stage Cycle"))
-		return "Retire Latency";
-
-	return se_header;
-}
-
-int arch_support_sort_key(const char *sort_key)
-{
-	if (!strcmp(sort_key, "p_stage_cyc"))
-		return 1;
-	if (!strcmp(sort_key, "local_p_stage_cyc"))
-		return 1;
-	return 0;
-}
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 9833c2c82a2f..a2d41614ef5e 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -947,7 +947,7 @@ int cmd_annotate(int argc, const char **argv)
 			annotate_opts.show_br_cntr = true;
 	}
 
-	if (setup_sorting(NULL) < 0)
+	if (setup_sorting(/*evlist=*/NULL, &annotate.session->header.env) < 0)
 		usage_with_options(annotate_usage, options);
 
 	ret = __cmd_annotate(&annotate);
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index e2e257bcc461..324368aabfa2 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -195,12 +195,14 @@ static struct hist_entry_ops c2c_entry_ops = {
 
 static int c2c_hists__init(struct c2c_hists *hists,
 			   const char *sort,
-			   int nr_header_lines);
+			   int nr_header_lines,
+			   struct perf_env *env);
 
 static struct c2c_hists*
 he__get_c2c_hists(struct hist_entry *he,
 		  const char *sort,
-		  int nr_header_lines)
+		  int nr_header_lines,
+		  struct perf_env *env)
 {
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *hists;
@@ -214,7 +216,7 @@ he__get_c2c_hists(struct hist_entry *he,
 	if (!hists)
 		return NULL;
 
-	ret = c2c_hists__init(hists, sort, nr_header_lines);
+	ret = c2c_hists__init(hists, sort, nr_header_lines, env);
 	if (ret) {
 		free(hists);
 		return NULL;
@@ -350,7 +352,7 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
 
 		mi = mi_dup;
 
-		c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2);
+		c2c_hists = he__get_c2c_hists(he, c2c.cl_sort, 2, machine->env);
 		if (!c2c_hists)
 			goto free_mi;
 
@@ -1966,7 +1968,8 @@ static struct c2c_fmt *get_format(const char *name)
 	return c2c_fmt;
 }
 
-static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
+static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name,
+				  struct perf_env *env __maybe_unused)
 {
 	struct c2c_fmt *c2c_fmt = get_format(name);
 	int level = 0;
@@ -1980,14 +1983,14 @@ static int c2c_hists__init_output(struct perf_hpp_list *hpp_list, char *name)
 	return 0;
 }
 
-static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
+static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name, struct perf_env *env)
 {
 	struct c2c_fmt *c2c_fmt = get_format(name);
 	struct c2c_dimension *dim;
 
 	if (!c2c_fmt) {
 		reset_dimensions();
-		return sort_dimension__add(hpp_list, name, NULL, 0);
+		return sort_dimension__add(hpp_list, name, /*evlist=*/NULL, env, /*level=*/0);
 	}
 
 	dim = c2c_fmt->dim;
@@ -2008,7 +2011,7 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
 										\
 		for (tok = strtok_r((char *)_list, ", ", &tmp);			\
 				tok; tok = strtok_r(NULL, ", ", &tmp)) {	\
-			ret = _fn(hpp_list, tok);				\
+			ret = _fn(hpp_list, tok, env);				\
 			if (ret == -EINVAL) {					\
 				pr_err("Invalid --fields key: `%s'", tok);	\
 				break;						\
@@ -2021,7 +2024,8 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
 
 static int hpp_list__parse(struct perf_hpp_list *hpp_list,
 			   const char *output_,
-			   const char *sort_)
+			   const char *sort_,
+			   struct perf_env *env)
 {
 	char *output = output_ ? strdup(output_) : NULL;
 	char *sort   = sort_   ? strdup(sort_) : NULL;
@@ -2052,7 +2056,8 @@ static int hpp_list__parse(struct perf_hpp_list *hpp_list,
 
 static int c2c_hists__init(struct c2c_hists *hists,
 			   const char *sort,
-			   int nr_header_lines)
+			   int nr_header_lines,
+			   struct perf_env *env)
 {
 	__hists__init(&hists->hists, &hists->list);
 
@@ -2066,15 +2071,16 @@ static int c2c_hists__init(struct c2c_hists *hists,
 	/* Overload number of header lines.*/
 	hists->list.nr_header_lines = nr_header_lines;
 
-	return hpp_list__parse(&hists->list, NULL, sort);
+	return hpp_list__parse(&hists->list, /*output=*/NULL, sort, env);
 }
 
 static int c2c_hists__reinit(struct c2c_hists *c2c_hists,
 			     const char *output,
-			     const char *sort)
+			     const char *sort,
+			     struct perf_env *env)
 {
 	perf_hpp__reset_output_field(&c2c_hists->list);
-	return hpp_list__parse(&c2c_hists->list, output, sort);
+	return hpp_list__parse(&c2c_hists->list, output, sort, env);
 }
 
 #define DISPLAY_LINE_LIMIT  0.001
@@ -2207,8 +2213,9 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
 	return 0;
 }
 
-static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_cl_cb(struct hist_entry *he, void *arg)
 {
+	struct perf_env *env = arg;
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *c2c_hists;
 	bool display = he__display(he, &c2c.shared_clines_stats);
@@ -2222,7 +2229,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 		c2c_he->cacheline_idx = idx++;
 		calc_width(c2c_he);
 
-		c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort);
+		c2c_hists__reinit(c2c_hists, c2c.cl_output, c2c.cl_resort, env);
 
 		hists__collapse_resort(&c2c_hists->hists, NULL);
 		hists__output_resort_cb(&c2c_hists->hists, NULL, filter_cb);
@@ -2333,7 +2340,7 @@ static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 	return 0;
 }
 
-static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
+static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb, void *arg)
 {
 	struct rb_node *next = rb_first_cached(&hists->entries);
 	int ret = 0;
@@ -2342,7 +2349,7 @@ static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
 		struct hist_entry *he;
 
 		he = rb_entry(next, struct hist_entry, rb_node);
-		ret = cb(he, NULL);
+		ret = cb(he, arg);
 		if (ret)
 			break;
 		next = rb_next(&he->rb_node);
@@ -2448,7 +2455,7 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
 	hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
 }
 
-static void print_pareto(FILE *out)
+static void print_pareto(FILE *out, struct perf_env *env)
 {
 	struct perf_hpp_list hpp_list;
 	struct rb_node *nd;
@@ -2473,7 +2480,7 @@ static void print_pareto(FILE *out)
 			    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
-	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
+	ret = hpp_list__parse(&hpp_list, cl_output, /*evlist=*/NULL, env);
 
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
@@ -2538,7 +2545,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
 	fprintf(out, "=================================================\n");
 	fprintf(out, "#\n");
 
-	print_pareto(out);
+	print_pareto(out, &session->header.env);
 }
 
 #ifdef HAVE_SLANG_SUPPORT
@@ -3095,7 +3102,7 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out_session;
 	}
 
-	err = c2c_hists__init(&c2c.hists, "dcacheline", 2);
+	err = c2c_hists__init(&c2c.hists, "dcacheline", 2, &session->header.env);
 	if (err) {
 		pr_debug("Failed to initialize hists\n");
 		goto out_session;
@@ -3179,13 +3186,13 @@ static int perf_c2c__report(int argc, const char **argv)
 	else if (c2c.display == DISPLAY_SNP_PEER)
 		sort_str = "tot_peer";
 
-	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
+	c2c_hists__reinit(&c2c.hists, output_str, sort_str, &session->header.env);
 
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
 	hists__collapse_resort(&c2c.hists.hists, NULL);
 	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
-	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
+	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb, &session->header.env);
 
 	ui_progress__finish();
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index ae490d58af92..f6735cf02329 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -2003,7 +2003,7 @@ int cmd_diff(int argc, const char **argv)
 		sort__mode = SORT_MODE__DIFF;
 	}
 
-	if (setup_sorting(NULL) < 0)
+	if (setup_sorting(/*evlist=*/NULL, &data__files[0].session->header.env) < 0)
 		usage_with_options(diff_usage, options);
 
 	setup_pager();
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f0299c7ee025..8be6e7c3bd22 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1776,7 +1776,7 @@ int cmd_report(int argc, const char **argv)
 	}
 
 	if ((last_key != K_SWITCH_INPUT_DATA && last_key != K_RELOAD) &&
-	    (setup_sorting(session->evlist) < 0)) {
+	    (setup_sorting(session->evlist, &session->header.env) < 0)) {
 		if (sort_order)
 			parse_options_usage(report_usage, options, "s", 1);
 		if (field_order)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 7b6cde87d2af..13ef0d188a96 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1747,7 +1747,14 @@ int cmd_top(int argc, const char **argv)
 
 	setup_browser(false);
 
-	if (setup_sorting(top.evlist) < 0) {
+	top.session = perf_session__new(/*data=*/NULL, /*tool=*/NULL);
+	if (IS_ERR(top.session)) {
+		status = PTR_ERR(top.session);
+		top.session = NULL;
+		goto out_delete_evlist;
+	}
+
+	if (setup_sorting(top.evlist, &top.session->header.env) < 0) {
 		if (sort_order)
 			parse_options_usage(top_usage, options, "s", 1);
 		if (field_order)
@@ -1820,13 +1827,6 @@ int cmd_top(int argc, const char **argv)
 		signal(SIGWINCH, winch_sig);
 	}
 
-	top.session = perf_session__new(NULL, NULL);
-	if (IS_ERR(top.session)) {
-		status = PTR_ERR(top.session);
-		top.session = NULL;
-		goto out_delete_evlist;
-	}
-
 	if (!evlist__needs_bpf_sb_event(top.evlist))
 		top.record_opts.no_bpf_event = true;
 
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 1e0f5a310fd5..3eb9ef8d7ec6 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -295,7 +295,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = false;
 	evsel__reset_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 	callchain_register_param(&callchain_param);
 
 	err = add_hist_entries(hists, machine);
@@ -442,7 +442,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = false;
 	evsel__set_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 	callchain_register_param(&callchain_param);
 
 	err = add_hist_entries(hists, machine);
@@ -500,7 +500,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = true;
 	evsel__reset_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 	callchain_register_param(&callchain_param);
 
 	err = add_hist_entries(hists, machine);
@@ -684,7 +684,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
 	symbol_conf.cumulate_callchain = true;
 	evsel__set_sample_bit(evsel, CALLCHAIN);
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	callchain_param = callchain_param_default;
 	callchain_register_param(&callchain_param);
diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
index 4b2e4f2fbe48..1cebd20cc91c 100644
--- a/tools/perf/tests/hists_filter.c
+++ b/tools/perf/tests/hists_filter.c
@@ -131,10 +131,6 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
 		goto out;
 	err = TEST_FAIL;
 
-	/* default sort order (comm,dso,sym) will be used */
-	if (setup_sorting(NULL) < 0)
-		goto out;
-
 	machines__init(&machines);
 
 	/* setup threads/dso/map/symbols also */
@@ -145,6 +141,10 @@ static int test__hists_filter(struct test_suite *test __maybe_unused, int subtes
 	if (verbose > 1)
 		machine__fprintf(machine, stderr);
 
+	/* default sort order (comm,dso,sym) will be used */
+	if (setup_sorting(evlist, machine->env) < 0)
+		goto out;
+
 	/* process sample events */
 	err = add_hist_entries(evlist, machine);
 	if (err < 0)
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 5b6f1e883466..996f5f0b3bd1 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -303,10 +303,6 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
 		goto out;
 
 	err = TEST_FAIL;
-	/* default sort order (comm,dso,sym) will be used */
-	if (setup_sorting(NULL) < 0)
-		goto out;
-
 	machines__init(&machines);
 
 	/* setup threads/dso/map/symbols also */
@@ -317,6 +313,10 @@ static int test__hists_link(struct test_suite *test __maybe_unused, int subtest
 	if (verbose > 1)
 		machine__fprintf(machine, stderr);
 
+	/* default sort order (comm,dso,sym) will be used */
+	if (setup_sorting(evlist, machine->env) < 0)
+		goto out;
+
 	/* process sample events */
 	err = add_hist_entries(evlist, machine);
 	if (err < 0)
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index 33b5cc8352a7..ee5ec8bda60e 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -146,7 +146,7 @@ static int test1(struct evsel *evsel, struct machine *machine)
 	field_order = NULL;
 	sort_order = NULL; /* equivalent to sort_order = "comm,dso,sym" */
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -248,7 +248,7 @@ static int test2(struct evsel *evsel, struct machine *machine)
 	field_order = "overhead,cpu";
 	sort_order = "pid";
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -304,7 +304,7 @@ static int test3(struct evsel *evsel, struct machine *machine)
 	field_order = "comm,overhead,dso";
 	sort_order = NULL;
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -378,7 +378,7 @@ static int test4(struct evsel *evsel, struct machine *machine)
 	field_order = "dso,sym,comm,overhead,dso";
 	sort_order = "sym";
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
@@ -480,7 +480,7 @@ static int test5(struct evsel *evsel, struct machine *machine)
 	field_order = "cpu,pid,comm,dso,sym";
 	sort_order = "dso,pid";
 
-	setup_sorting(NULL);
+	setup_sorting(/*evlist=*/NULL, machine->env);
 
 	/*
 	 * expected output:
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 119bce37f4fd..3b97a31736c5 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -390,9 +390,6 @@ extern unsigned int proc_map_timeout;
 #define PAGE_SIZE_NAME_LEN	32
 char *get_page_size_name(u64 size, char *str);
 
-const char *arch_perf_header_entry(const char *se_header);
-int arch_support_sort_key(const char *sort_key);
-
 static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
 {
 	return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index dda4ef0b5a73..ccab10cd24a5 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2526,19 +2526,44 @@ struct sort_dimension {
 	int			taken;
 };
 
-int __weak arch_support_sort_key(const char *sort_key __maybe_unused)
+static int arch_support_sort_key(const char *sort_key, struct perf_env *env)
 {
+	const char *arch = perf_env__arch(env);
+
+	if (!strcmp("x86", arch) || !strcmp("powerpc", arch)) {
+		if (!strcmp(sort_key, "p_stage_cyc"))
+			return 1;
+		if (!strcmp(sort_key, "local_p_stage_cyc"))
+			return 1;
+	}
 	return 0;
 }
 
-const char * __weak arch_perf_header_entry(const char *se_header)
-{
+static const char *arch_perf_header_entry(const char *se_header, struct perf_env *env)
+{
+	const char *arch = perf_env__arch(env);
+
+	if (!strcmp("x86", arch)) {
+		if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
+			return "Local Retire Latency";
+		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
+			return "Retire Latency";
+	} else if (!strcmp("powerpc", arch)) {
+		if (!strcmp(se_header, "Local INSTR Latency"))
+			return "Finish Cyc";
+		else if (!strcmp(se_header, "INSTR Latency"))
+			return "Global Finish_cyc";
+		else if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
+			return "Dispatch Cyc";
+		else if (!strcmp(se_header, "Pipeline Stage Cycle"))
+			return "Global Dispatch_cyc";
+	}
 	return se_header;
 }
 
-static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
+static void sort_dimension_add_dynamic_header(struct sort_dimension *sd, struct perf_env *env)
 {
-	sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header);
+	sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header, env);
 }
 
 #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
@@ -3590,7 +3615,7 @@ int hpp_dimension__add_output(unsigned col, bool implicit)
 }
 
 int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
-			struct evlist *evlist,
+			struct evlist *evlist, struct perf_env *env,
 			int level)
 {
 	unsigned int i, j;
@@ -3603,7 +3628,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 	 */
 	for (j = 0; j < ARRAY_SIZE(arch_specific_sort_keys); j++) {
 		if (!strcmp(arch_specific_sort_keys[j], tok) &&
-				!arch_support_sort_key(tok)) {
+		    !arch_support_sort_key(tok, env)) {
 			return 0;
 		}
 	}
@@ -3616,7 +3641,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 
 		for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
 			if (sd->name && !strcmp(dynamic_headers[j], sd->name))
-				sort_dimension_add_dynamic_header(sd);
+				sort_dimension_add_dynamic_header(sd, env);
 		}
 
 		if (sd->entry == &sort_parent && parent_pattern) {
@@ -3712,13 +3737,13 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 }
 
 /* This should match with sort_dimension__add() above */
-static bool is_hpp_sort_key(const char *key)
+static bool is_hpp_sort_key(const char *key, struct perf_env *env)
 {
 	unsigned i;
 
 	for (i = 0; i < ARRAY_SIZE(arch_specific_sort_keys); i++) {
 		if (!strcmp(arch_specific_sort_keys[i], key) &&
-		    !arch_support_sort_key(key)) {
+		    !arch_support_sort_key(key, env)) {
 			return false;
 		}
 	}
@@ -3740,7 +3765,7 @@ static bool is_hpp_sort_key(const char *key)
 }
 
 static int setup_sort_list(struct perf_hpp_list *list, char *str,
-			   struct evlist *evlist)
+			   struct evlist *evlist, struct perf_env *env)
 {
 	char *tmp, *tok;
 	int ret = 0;
@@ -3769,7 +3794,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
 		}
 
 		if (*tok) {
-			if (is_hpp_sort_key(tok)) {
+			if (is_hpp_sort_key(tok, env)) {
 				/* keep output (hpp) sort keys in the same level */
 				if (prev_was_hpp) {
 					bool next_same = (level == next_level);
@@ -3782,7 +3807,7 @@ static int setup_sort_list(struct perf_hpp_list *list, char *str,
 				prev_was_hpp = false;
 			}
 
-			ret = sort_dimension__add(list, tok, evlist, level);
+			ret = sort_dimension__add(list, tok, evlist, env, level);
 			if (ret == -EINVAL) {
 				if (!cacheline_size() && !strncasecmp(tok, "dcacheline", strlen(tok)))
 					ui__error("The \"dcacheline\" --sort key needs to know the cacheline size and it couldn't be determined on this system");
@@ -3911,7 +3936,7 @@ static char *setup_overhead(char *keys)
 	return keys;
 }
 
-static int __setup_sorting(struct evlist *evlist)
+static int __setup_sorting(struct evlist *evlist, struct perf_env *env)
 {
 	char *str;
 	const char *sort_keys;
@@ -3951,7 +3976,7 @@ static int __setup_sorting(struct evlist *evlist)
 		}
 	}
 
-	ret = setup_sort_list(&perf_hpp_list, str, evlist);
+	ret = setup_sort_list(&perf_hpp_list, str, evlist, env);
 
 	free(str);
 	return ret;
@@ -4187,16 +4212,16 @@ static int __setup_output_field(void)
 	return ret;
 }
 
-int setup_sorting(struct evlist *evlist)
+int setup_sorting(struct evlist *evlist, struct perf_env *env)
 {
 	int err;
 
-	err = __setup_sorting(evlist);
+	err = __setup_sorting(evlist, env);
 	if (err < 0)
 		return err;
 
 	if (parent_pattern != default_parent_pattern) {
-		err = sort_dimension__add(&perf_hpp_list, "parent", evlist, -1);
+		err = sort_dimension__add(&perf_hpp_list, "parent", evlist, env, -1);
 		if (err < 0)
 			return err;
 	}
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index a742ab7f3c67..d7787958e06b 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -6,6 +6,7 @@
 #include "hist.h"
 
 struct option;
+struct perf_env;
 
 extern regex_t parent_regex;
 extern const char *sort_order;
@@ -130,7 +131,7 @@ extern struct sort_entry sort_thread;
 
 struct evlist;
 struct tep_handle;
-int setup_sorting(struct evlist *evlist);
+int setup_sorting(struct evlist *evlist, struct perf_env *env);
 int setup_output_field(void);
 void reset_output_field(void);
 void sort__setup_elide(FILE *fp);
@@ -145,7 +146,7 @@ bool is_strict_order(const char *order);
 int hpp_dimension__add_output(unsigned col, bool implicit);
 void reset_dimensions(void);
 int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
-			struct evlist *evlist,
+			struct evlist *evlist, struct perf_env *env,
 			int level);
 int output_field_add(struct perf_hpp_list *list, const char *tok, int *level);
 int64_t
-- 
2.49.0.1143.g0be31eac6b-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 16:53 ` [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
@ 2025-05-21 20:26   ` Namhyung Kim
  2025-05-21 21:15     ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2025-05-21 20:26 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> By definition arch sample parsing and synthesis will inhibit certain
> kinds of cross-platform record then analysis (report, script,
> etc.). Remove arch_perf_parse_sample_weight and
> arch_perf_synthesize_sample_weight replacing with a common
> implementation. Combine perf_sample p_stage_cyc and retire_lat to
> capture the differing uses regardless of compiled for architecture.

Can you please do this without renaming?  It can be a separate patch but
I think we can just leave it.

Thanks,
Namhyung

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/arch/powerpc/util/event.c       | 26 ---------------------
>  tools/perf/arch/x86/tests/sample-parsing.c |  4 ++--
>  tools/perf/arch/x86/util/event.c           | 27 ----------------------
>  tools/perf/builtin-script.c                |  2 +-
>  tools/perf/util/dlfilter.c                 |  2 +-
>  tools/perf/util/event.h                    |  2 --
>  tools/perf/util/evsel.c                    | 17 ++++++++++----
>  tools/perf/util/hist.c                     |  4 ++--
>  tools/perf/util/hist.h                     |  2 +-
>  tools/perf/util/intel-tpebs.c              |  4 ++--
>  tools/perf/util/sample.h                   |  5 +---
>  tools/perf/util/session.c                  |  2 +-
>  tools/perf/util/sort.c                     |  6 ++---
>  tools/perf/util/synthetic-events.c         | 10 ++++++--
>  14 files changed, 34 insertions(+), 79 deletions(-)
> 
> diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
> index 77d8cc2b5691..024ac8b54c33 100644
> --- a/tools/perf/arch/powerpc/util/event.c
> +++ b/tools/perf/arch/powerpc/util/event.c
> @@ -11,32 +11,6 @@
>  #include "../../../util/debug.h"
>  #include "../../../util/sample.h"
>  
> -void arch_perf_parse_sample_weight(struct perf_sample *data,
> -				   const __u64 *array, u64 type)
> -{
> -	union perf_sample_weight weight;
> -
> -	weight.full = *array;
> -	if (type & PERF_SAMPLE_WEIGHT)
> -		data->weight = weight.full;
> -	else {
> -		data->weight = weight.var1_dw;
> -		data->ins_lat = weight.var2_w;
> -		data->p_stage_cyc = weight.var3_w;
> -	}
> -}
> -
> -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> -					__u64 *array, u64 type)
> -{
> -	*array = data->weight;
> -
> -	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> -		*array &= 0xffffffff;
> -		*array |= ((u64)data->ins_lat << 32);
> -	}
> -}
> -
>  const char *arch_perf_header_entry(const char *se_header)
>  {
>  	if (!strcmp(se_header, "Local INSTR Latency"))
> diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
> index a061e8619267..95d8f7f1d2fb 100644
> --- a/tools/perf/arch/x86/tests/sample-parsing.c
> +++ b/tools/perf/arch/x86/tests/sample-parsing.c
> @@ -29,7 +29,7 @@ static bool samples_same(const struct perf_sample *s1,
>  {
>  	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
>  		COMP(ins_lat);
> -		COMP(retire_lat);
> +		COMP(p_stage_cyc_or_retire_lat);
>  	}
>  
>  	return true;
> @@ -50,7 +50,7 @@ static int do_test(u64 sample_type)
>  	struct perf_sample sample = {
>  		.weight		= 101,
>  		.ins_lat        = 102,
> -		.retire_lat     = 103,
> +		.p_stage_cyc_or_retire_lat = 103,
>  	};
>  	struct perf_sample sample_out;
>  	size_t i, sz, bufsz;
> diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
> index a0400707180c..576c1c36046c 100644
> --- a/tools/perf/arch/x86/util/event.c
> +++ b/tools/perf/arch/x86/util/event.c
> @@ -92,33 +92,6 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
>  
>  #endif
>  
> -void arch_perf_parse_sample_weight(struct perf_sample *data,
> -				   const __u64 *array, u64 type)
> -{
> -	union perf_sample_weight weight;
> -
> -	weight.full = *array;
> -	if (type & PERF_SAMPLE_WEIGHT)
> -		data->weight = weight.full;
> -	else {
> -		data->weight = weight.var1_dw;
> -		data->ins_lat = weight.var2_w;
> -		data->retire_lat = weight.var3_w;
> -	}
> -}
> -
> -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> -					__u64 *array, u64 type)
> -{
> -	*array = data->weight;
> -
> -	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> -		*array &= 0xffffffff;
> -		*array |= ((u64)data->ins_lat << 32);
> -		*array |= ((u64)data->retire_lat << 48);
> -	}
> -}
> -
>  const char *arch_perf_header_entry(const char *se_header)
>  {
>  	if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 6c3bf74dd78c..c02c435e0f0b 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2251,7 +2251,7 @@ static void process_event(struct perf_script *script,
>  		fprintf(fp, "%16" PRIu16, sample->ins_lat);
>  
>  	if (PRINT_FIELD(RETIRE_LAT))
> -		fprintf(fp, "%16" PRIu16, sample->retire_lat);
> +		fprintf(fp, "%16" PRIu16, sample->p_stage_cyc_or_retire_lat);
>  
>  	if (PRINT_FIELD(CGROUP)) {
>  		const char *cgrp_name;
> diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
> index ddacef881af2..d5fd6d34a17c 100644
> --- a/tools/perf/util/dlfilter.c
> +++ b/tools/perf/util/dlfilter.c
> @@ -513,6 +513,7 @@ int dlfilter__do_filter_event(struct dlfilter *d,
>  	d->d_addr_al   = &d_addr_al;
>  
>  	d_sample.size  = sizeof(d_sample);
> +	d_sample.p_stage_cyc = sample->p_stage_cyc_or_retire_lat;
>  	d_ip_al.size   = 0; /* To indicate d_ip_al is not initialized */
>  	d_addr_al.size = 0; /* To indicate d_addr_al is not initialized */
>  
> @@ -526,7 +527,6 @@ int dlfilter__do_filter_event(struct dlfilter *d,
>  	ASSIGN(period);
>  	ASSIGN(weight);
>  	ASSIGN(ins_lat);
> -	ASSIGN(p_stage_cyc);
>  	ASSIGN(transaction);
>  	ASSIGN(insn_cnt);
>  	ASSIGN(cyc_cnt);
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 664bf39567ce..119bce37f4fd 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -390,8 +390,6 @@ extern unsigned int proc_map_timeout;
>  #define PAGE_SIZE_NAME_LEN	32
>  char *get_page_size_name(u64 size, char *str);
>  
> -void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
> -void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
>  const char *arch_perf_header_entry(const char *se_header);
>  int arch_support_sort_key(const char *sort_key);
>  
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index d55482f094bf..27de167855ee 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2846,11 +2846,18 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
>  	return 0;
>  }
>  
> -void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
> -					  const __u64 *array,
> -					  u64 type __maybe_unused)
> +static void perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type)
>  {
> -	data->weight = *array;
> +	union perf_sample_weight weight;
> +
> +	weight.full = *array;
> +	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> +		data->weight = weight.var1_dw;
> +		data->ins_lat = weight.var2_w;
> +		data->p_stage_cyc_or_retire_lat = weight.var3_w;
> +	} else {
> +		data->weight = weight.full;
> +	}
>  }
>  
>  u64 evsel__bitfield_swap_branch_flags(u64 value)
> @@ -3236,7 +3243,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  
>  	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
>  		OVERFLOW_CHECK_u64(array);
> -		arch_perf_parse_sample_weight(data, array, type);
> +		perf_parse_sample_weight(data, array, type);
>  		array++;
>  	}
>  
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index afc6855327ab..ae9803dca0b1 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -829,7 +829,7 @@ __hists__add_entry(struct hists *hists,
>  			.period	= sample->period,
>  			.weight1 = sample->weight,
>  			.weight2 = sample->ins_lat,
> -			.weight3 = sample->p_stage_cyc,
> +			.weight3 = sample->p_stage_cyc_or_retire_lat,
>  			.latency = al->latency,
>  		},
>  		.parent = sym_parent,
> @@ -846,7 +846,7 @@ __hists__add_entry(struct hists *hists,
>  		.time = hist_time(sample->time),
>  		.weight = sample->weight,
>  		.ins_lat = sample->ins_lat,
> -		.p_stage_cyc = sample->p_stage_cyc,
> +		.p_stage_cyc_or_retire_lat = sample->p_stage_cyc_or_retire_lat,
>  		.simd_flags = sample->simd_flags,
>  	}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
>  
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index c64254088fc7..67033bdabcf4 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -255,7 +255,7 @@ struct hist_entry {
>  	u64			code_page_size;
>  	u64			weight;
>  	u64			ins_lat;
> -	u64			p_stage_cyc;
> +	u64			p_stage_cyc_or_retire_lat;
>  	s32			socket;
>  	s32			cpu;
>  	int			parallelism;
> diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> index 4ad4bc118ea5..ec2f3ecf1e1c 100644
> --- a/tools/perf/util/intel-tpebs.c
> +++ b/tools/perf/util/intel-tpebs.c
> @@ -202,8 +202,8 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
>  	 * latency value will be used. Save the number of samples and the sum of
>  	 * retire latency value for each event.
>  	 */
> -	t->last = sample->retire_lat;
> -	update_stats(&t->stats, sample->retire_lat);
> +	t->last = sample->p_stage_cyc_or_retire_lat;
> +	update_stats(&t->stats, sample->p_stage_cyc_or_retire_lat);
>  	mutex_unlock(tpebs_mtx_get());
>  	return 0;
>  }
> diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
> index 0e96240052e9..3330d18fb5fd 100644
> --- a/tools/perf/util/sample.h
> +++ b/tools/perf/util/sample.h
> @@ -104,10 +104,7 @@ struct perf_sample {
>  	u8  cpumode;
>  	u16 misc;
>  	u16 ins_lat;
> -	union {
> -		u16 p_stage_cyc;
> -		u16 retire_lat;
> -	};
> +	u16 p_stage_cyc_or_retire_lat;
>  	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
>  	char insn[MAX_INSN];
>  	void *raw_data;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index a320672c264e..451bc24ccfba 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1094,7 +1094,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
>  		printf("... weight: %" PRIu64 "", sample->weight);
>  			if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
>  				printf(",0x%"PRIx16"", sample->ins_lat);
> -				printf(",0x%"PRIx16"", sample->p_stage_cyc);
> +				printf(",0x%"PRIx16"", sample->p_stage_cyc_or_retire_lat);
>  			}
>  		printf("\n");
>  	}
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 45e654653960..dda4ef0b5a73 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -1879,21 +1879,21 @@ struct sort_entry sort_global_ins_lat = {
>  static int64_t
>  sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
>  {
> -	return left->p_stage_cyc - right->p_stage_cyc;
> +	return left->p_stage_cyc_or_retire_lat - right->p_stage_cyc_or_retire_lat;
>  }
>  
>  static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
>  					size_t size, unsigned int width)
>  {
>  	return repsep_snprintf(bf, size, "%-*u", width,
> -			he->p_stage_cyc * he->stat.nr_events);
> +			he->p_stage_cyc_or_retire_lat * he->stat.nr_events);
>  }
>  
>  
>  static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
>  					size_t size, unsigned int width)
>  {
> -	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
> +	return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc_or_retire_lat);
>  }
>  
>  struct sort_entry sort_local_p_stage_cyc = {
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 2fc4d0537840..449a41900fc4 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1567,10 +1567,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>  	return result;
>  }
>  
> -void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> +static void perf_synthesize_sample_weight(const struct perf_sample *data,
>  					       __u64 *array, u64 type __maybe_unused)
>  {
>  	*array = data->weight;
> +
> +	if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> +		*array &= 0xffffffff;
> +		*array |= ((u64)data->ins_lat << 32);
> +		*array |= ((u64)data->p_stage_cyc_or_retire_lat << 48);
> +	}
>  }
>  
>  static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
> @@ -1730,7 +1736,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>  	}
>  
>  	if (type & PERF_SAMPLE_WEIGHT_TYPE) {
> -		arch_perf_synthesize_sample_weight(sample, array, type);
> +		perf_synthesize_sample_weight(sample, array, type);
>  		array++;
>  	}
>  
> -- 
> 2.49.0.1143.g0be31eac6b-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 20:26   ` Namhyung Kim
@ 2025-05-21 21:15     ` Ian Rogers
  2025-05-28 18:11       ` Namhyung Kim
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-21 21:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 21, 2025 at 1:27 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> > By definition arch sample parsing and synthesis will inhibit certain
> > kinds of cross-platform record then analysis (report, script,
> > etc.). Remove arch_perf_parse_sample_weight and
> > arch_perf_synthesize_sample_weight replacing with a common
> > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > capture the differing uses regardless of compiled for architecture.
>
> Can you please do this without renaming?  It can be a separate patch but
> I think we can just leave it.

It is not clear what the use of the union is. Presumably it is a
tagged union but there is no tag as the union value to use is implied
by either being built on x86_64 or powerpc. The change removes the
notion of this code being built for x86_64 or powerpc and so the union
value to use isn't clear (e.g. should arm use p_stage_cyc or
retire_lat from the union), hence combining to show that it could be
one or the other. The code could be:
```
#ifdef __x86_64__
       u16 p_stage_cyc;
#elif defined(powerpc)
       u16 retire_lat;
#endif
```
but this isn't cross-platform. The change in hist.h of
```
@@ -255,7 +255,7 @@ struct hist_entry {
        u64                     code_page_size;
        u64                     weight;
        u64                     ins_lat;
-       u64                     p_stage_cyc;
+       u64                     p_stage_cyc_or_retire_lat;
```
could be a follow up CL, but then we lose something of what the field
is holding given the value is just a copy of that same named value in
perf_sample. The code only inserts 34 lines and so the churn of doing
that seemed worse than having the change in a single patch for
clarity.

Thanks,
Ian

> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/arch/powerpc/util/event.c       | 26 ---------------------
> >  tools/perf/arch/x86/tests/sample-parsing.c |  4 ++--
> >  tools/perf/arch/x86/util/event.c           | 27 ----------------------
> >  tools/perf/builtin-script.c                |  2 +-
> >  tools/perf/util/dlfilter.c                 |  2 +-
> >  tools/perf/util/event.h                    |  2 --
> >  tools/perf/util/evsel.c                    | 17 ++++++++++----
> >  tools/perf/util/hist.c                     |  4 ++--
> >  tools/perf/util/hist.h                     |  2 +-
> >  tools/perf/util/intel-tpebs.c              |  4 ++--
> >  tools/perf/util/sample.h                   |  5 +---
> >  tools/perf/util/session.c                  |  2 +-
> >  tools/perf/util/sort.c                     |  6 ++---
> >  tools/perf/util/synthetic-events.c         | 10 ++++++--
> >  14 files changed, 34 insertions(+), 79 deletions(-)
> >
> > diff --git a/tools/perf/arch/powerpc/util/event.c b/tools/perf/arch/powerpc/util/event.c
> > index 77d8cc2b5691..024ac8b54c33 100644
> > --- a/tools/perf/arch/powerpc/util/event.c
> > +++ b/tools/perf/arch/powerpc/util/event.c
> > @@ -11,32 +11,6 @@
> >  #include "../../../util/debug.h"
> >  #include "../../../util/sample.h"
> >
> > -void arch_perf_parse_sample_weight(struct perf_sample *data,
> > -                                const __u64 *array, u64 type)
> > -{
> > -     union perf_sample_weight weight;
> > -
> > -     weight.full = *array;
> > -     if (type & PERF_SAMPLE_WEIGHT)
> > -             data->weight = weight.full;
> > -     else {
> > -             data->weight = weight.var1_dw;
> > -             data->ins_lat = weight.var2_w;
> > -             data->p_stage_cyc = weight.var3_w;
> > -     }
> > -}
> > -
> > -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> > -                                     __u64 *array, u64 type)
> > -{
> > -     *array = data->weight;
> > -
> > -     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > -             *array &= 0xffffffff;
> > -             *array |= ((u64)data->ins_lat << 32);
> > -     }
> > -}
> > -
> >  const char *arch_perf_header_entry(const char *se_header)
> >  {
> >       if (!strcmp(se_header, "Local INSTR Latency"))
> > diff --git a/tools/perf/arch/x86/tests/sample-parsing.c b/tools/perf/arch/x86/tests/sample-parsing.c
> > index a061e8619267..95d8f7f1d2fb 100644
> > --- a/tools/perf/arch/x86/tests/sample-parsing.c
> > +++ b/tools/perf/arch/x86/tests/sample-parsing.c
> > @@ -29,7 +29,7 @@ static bool samples_same(const struct perf_sample *s1,
> >  {
> >       if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> >               COMP(ins_lat);
> > -             COMP(retire_lat);
> > +             COMP(p_stage_cyc_or_retire_lat);
> >       }
> >
> >       return true;
> > @@ -50,7 +50,7 @@ static int do_test(u64 sample_type)
> >       struct perf_sample sample = {
> >               .weight         = 101,
> >               .ins_lat        = 102,
> > -             .retire_lat     = 103,
> > +             .p_stage_cyc_or_retire_lat = 103,
> >       };
> >       struct perf_sample sample_out;
> >       size_t i, sz, bufsz;
> > diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
> > index a0400707180c..576c1c36046c 100644
> > --- a/tools/perf/arch/x86/util/event.c
> > +++ b/tools/perf/arch/x86/util/event.c
> > @@ -92,33 +92,6 @@ int perf_event__synthesize_extra_kmaps(const struct perf_tool *tool,
> >
> >  #endif
> >
> > -void arch_perf_parse_sample_weight(struct perf_sample *data,
> > -                                const __u64 *array, u64 type)
> > -{
> > -     union perf_sample_weight weight;
> > -
> > -     weight.full = *array;
> > -     if (type & PERF_SAMPLE_WEIGHT)
> > -             data->weight = weight.full;
> > -     else {
> > -             data->weight = weight.var1_dw;
> > -             data->ins_lat = weight.var2_w;
> > -             data->retire_lat = weight.var3_w;
> > -     }
> > -}
> > -
> > -void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> > -                                     __u64 *array, u64 type)
> > -{
> > -     *array = data->weight;
> > -
> > -     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > -             *array &= 0xffffffff;
> > -             *array |= ((u64)data->ins_lat << 32);
> > -             *array |= ((u64)data->retire_lat << 48);
> > -     }
> > -}
> > -
> >  const char *arch_perf_header_entry(const char *se_header)
> >  {
> >       if (!strcmp(se_header, "Local Pipeline Stage Cycle"))
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index 6c3bf74dd78c..c02c435e0f0b 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -2251,7 +2251,7 @@ static void process_event(struct perf_script *script,
> >               fprintf(fp, "%16" PRIu16, sample->ins_lat);
> >
> >       if (PRINT_FIELD(RETIRE_LAT))
> > -             fprintf(fp, "%16" PRIu16, sample->retire_lat);
> > +             fprintf(fp, "%16" PRIu16, sample->p_stage_cyc_or_retire_lat);
> >
> >       if (PRINT_FIELD(CGROUP)) {
> >               const char *cgrp_name;
> > diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
> > index ddacef881af2..d5fd6d34a17c 100644
> > --- a/tools/perf/util/dlfilter.c
> > +++ b/tools/perf/util/dlfilter.c
> > @@ -513,6 +513,7 @@ int dlfilter__do_filter_event(struct dlfilter *d,
> >       d->d_addr_al   = &d_addr_al;
> >
> >       d_sample.size  = sizeof(d_sample);
> > +     d_sample.p_stage_cyc = sample->p_stage_cyc_or_retire_lat;
> >       d_ip_al.size   = 0; /* To indicate d_ip_al is not initialized */
> >       d_addr_al.size = 0; /* To indicate d_addr_al is not initialized */
> >
> > @@ -526,7 +527,6 @@ int dlfilter__do_filter_event(struct dlfilter *d,
> >       ASSIGN(period);
> >       ASSIGN(weight);
> >       ASSIGN(ins_lat);
> > -     ASSIGN(p_stage_cyc);
> >       ASSIGN(transaction);
> >       ASSIGN(insn_cnt);
> >       ASSIGN(cyc_cnt);
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index 664bf39567ce..119bce37f4fd 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -390,8 +390,6 @@ extern unsigned int proc_map_timeout;
> >  #define PAGE_SIZE_NAME_LEN   32
> >  char *get_page_size_name(u64 size, char *str);
> >
> > -void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type);
> > -void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *array, u64 type);
> >  const char *arch_perf_header_entry(const char *se_header);
> >  int arch_support_sort_key(const char *sort_key);
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index d55482f094bf..27de167855ee 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -2846,11 +2846,18 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
> >       return 0;
> >  }
> >
> > -void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
> > -                                       const __u64 *array,
> > -                                       u64 type __maybe_unused)
> > +static void perf_parse_sample_weight(struct perf_sample *data, const __u64 *array, u64 type)
> >  {
> > -     data->weight = *array;
> > +     union perf_sample_weight weight;
> > +
> > +     weight.full = *array;
> > +     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > +             data->weight = weight.var1_dw;
> > +             data->ins_lat = weight.var2_w;
> > +             data->p_stage_cyc_or_retire_lat = weight.var3_w;
> > +     } else {
> > +             data->weight = weight.full;
> > +     }
> >  }
> >
> >  u64 evsel__bitfield_swap_branch_flags(u64 value)
> > @@ -3236,7 +3243,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> >
> >       if (type & PERF_SAMPLE_WEIGHT_TYPE) {
> >               OVERFLOW_CHECK_u64(array);
> > -             arch_perf_parse_sample_weight(data, array, type);
> > +             perf_parse_sample_weight(data, array, type);
> >               array++;
> >       }
> >
> > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > index afc6855327ab..ae9803dca0b1 100644
> > --- a/tools/perf/util/hist.c
> > +++ b/tools/perf/util/hist.c
> > @@ -829,7 +829,7 @@ __hists__add_entry(struct hists *hists,
> >                       .period = sample->period,
> >                       .weight1 = sample->weight,
> >                       .weight2 = sample->ins_lat,
> > -                     .weight3 = sample->p_stage_cyc,
> > +                     .weight3 = sample->p_stage_cyc_or_retire_lat,
> >                       .latency = al->latency,
> >               },
> >               .parent = sym_parent,
> > @@ -846,7 +846,7 @@ __hists__add_entry(struct hists *hists,
> >               .time = hist_time(sample->time),
> >               .weight = sample->weight,
> >               .ins_lat = sample->ins_lat,
> > -             .p_stage_cyc = sample->p_stage_cyc,
> > +             .p_stage_cyc_or_retire_lat = sample->p_stage_cyc_or_retire_lat,
> >               .simd_flags = sample->simd_flags,
> >       }, *he = hists__findnew_entry(hists, &entry, al, sample_self);
> >
> > diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> > index c64254088fc7..67033bdabcf4 100644
> > --- a/tools/perf/util/hist.h
> > +++ b/tools/perf/util/hist.h
> > @@ -255,7 +255,7 @@ struct hist_entry {
> >       u64                     code_page_size;
> >       u64                     weight;
> >       u64                     ins_lat;
> > -     u64                     p_stage_cyc;
> > +     u64                     p_stage_cyc_or_retire_lat;
> >       s32                     socket;
> >       s32                     cpu;
> >       int                     parallelism;
> > diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> > index 4ad4bc118ea5..ec2f3ecf1e1c 100644
> > --- a/tools/perf/util/intel-tpebs.c
> > +++ b/tools/perf/util/intel-tpebs.c
> > @@ -202,8 +202,8 @@ static int process_sample_event(const struct perf_tool *tool __maybe_unused,
> >        * latency value will be used. Save the number of samples and the sum of
> >        * retire latency value for each event.
> >        */
> > -     t->last = sample->retire_lat;
> > -     update_stats(&t->stats, sample->retire_lat);
> > +     t->last = sample->p_stage_cyc_or_retire_lat;
> > +     update_stats(&t->stats, sample->p_stage_cyc_or_retire_lat);
> >       mutex_unlock(tpebs_mtx_get());
> >       return 0;
> >  }
> > diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
> > index 0e96240052e9..3330d18fb5fd 100644
> > --- a/tools/perf/util/sample.h
> > +++ b/tools/perf/util/sample.h
> > @@ -104,10 +104,7 @@ struct perf_sample {
> >       u8  cpumode;
> >       u16 misc;
> >       u16 ins_lat;
> > -     union {
> > -             u16 p_stage_cyc;
> > -             u16 retire_lat;
> > -     };
> > +     u16 p_stage_cyc_or_retire_lat;
> >       bool no_hw_idx;         /* No hw_idx collected in branch_stack */
> >       char insn[MAX_INSN];
> >       void *raw_data;
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index a320672c264e..451bc24ccfba 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1094,7 +1094,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
> >               printf("... weight: %" PRIu64 "", sample->weight);
> >                       if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
> >                               printf(",0x%"PRIx16"", sample->ins_lat);
> > -                             printf(",0x%"PRIx16"", sample->p_stage_cyc);
> > +                             printf(",0x%"PRIx16"", sample->p_stage_cyc_or_retire_lat);
> >                       }
> >               printf("\n");
> >       }
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > index 45e654653960..dda4ef0b5a73 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -1879,21 +1879,21 @@ struct sort_entry sort_global_ins_lat = {
> >  static int64_t
> >  sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
> >  {
> > -     return left->p_stage_cyc - right->p_stage_cyc;
> > +     return left->p_stage_cyc_or_retire_lat - right->p_stage_cyc_or_retire_lat;
> >  }
> >
> >  static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
> >                                       size_t size, unsigned int width)
> >  {
> >       return repsep_snprintf(bf, size, "%-*u", width,
> > -                     he->p_stage_cyc * he->stat.nr_events);
> > +                     he->p_stage_cyc_or_retire_lat * he->stat.nr_events);
> >  }
> >
> >
> >  static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
> >                                       size_t size, unsigned int width)
> >  {
> > -     return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
> > +     return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc_or_retire_lat);
> >  }
> >
> >  struct sort_entry sort_local_p_stage_cyc = {
> > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > index 2fc4d0537840..449a41900fc4 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -1567,10 +1567,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
> >       return result;
> >  }
> >
> > -void __weak arch_perf_synthesize_sample_weight(const struct perf_sample *data,
> > +static void perf_synthesize_sample_weight(const struct perf_sample *data,
> >                                              __u64 *array, u64 type __maybe_unused)
> >  {
> >       *array = data->weight;
> > +
> > +     if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
> > +             *array &= 0xffffffff;
> > +             *array |= ((u64)data->ins_lat << 32);
> > +             *array |= ((u64)data->p_stage_cyc_or_retire_lat << 48);
> > +     }
> >  }
> >
> >  static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
> > @@ -1730,7 +1736,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
> >       }
> >
> >       if (type & PERF_SAMPLE_WEIGHT_TYPE) {
> > -             arch_perf_synthesize_sample_weight(sample, array, type);
> > +             perf_synthesize_sample_weight(sample, array, type);
> >               array++;
> >       }
> >
> > --
> > 2.49.0.1143.g0be31eac6b-goog
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] Generic weight struct, use env for sort key and header
  2025-05-21 16:53 [PATCH v3 0/3] Generic weight struct, use env for sort key and header Ian Rogers
                   ` (2 preceding siblings ...)
  2025-05-21 16:53 ` [PATCH v3 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
@ 2025-05-27 21:01 ` Ian Rogers
  3 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2025-05-27 21:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark,
	Masami Hiramatsu (Google), Ravi Bangoria, Leo Yan, Yujie Liu,
	Graham Woodward, Howard Chu, Weilin Wang, Dmitry Vyukov,
	Andi Kleen, Thomas Falcon, Matt Fleming, Chun-Tse Shao,
	Ben Gainey, Song Liu, linux-kernel, linux-perf-users, Kajol Jain,
	Athira Rajeev

On Wed, May 21, 2025 at 9:53 AM Ian Rogers <irogers@google.com> wrote:
>
> The arch directory is a barrier to cross-platform development as files
> and behaviors within it are inherently platform specific. Sample
> parsing should be generic but the PERF_SAMPLE_WEIGHT_STRUCT handling
> was only present if building for x86 or powerpc. The sort key and
> headers should be specific to the session that is being executed and
> not to the machine perf is being run upon. These patches clean this
> and associated code up.
>
> v3: Code reorganizations suggested by Kan Liang.
> v2: Avoid changes to include/perf/perf_dlfilter.h as suggested by
>     Adrian Hunter.

Ping. Thanks,
Ian

> Ian Rogers (3):
>   perf sample: Remove arch notion of sample parsing
>   perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test
>   perf sort: Use perf_env to set arch sort keys and header
>
>  tools/perf/arch/powerpc/util/Build         |   1 -
>  tools/perf/arch/powerpc/util/event.c       |  60 ----------
>  tools/perf/arch/x86/include/arch-tests.h   |   1 -
>  tools/perf/arch/x86/tests/Build            |   1 -
>  tools/perf/arch/x86/tests/arch-tests.c     |   2 -
>  tools/perf/arch/x86/tests/sample-parsing.c | 125 ---------------------
>  tools/perf/arch/x86/util/event.c           |  46 --------
>  tools/perf/builtin-annotate.c              |   2 +-
>  tools/perf/builtin-c2c.c                   |  53 +++++----
>  tools/perf/builtin-diff.c                  |   2 +-
>  tools/perf/builtin-report.c                |   2 +-
>  tools/perf/builtin-script.c                |   2 +-
>  tools/perf/builtin-top.c                   |  16 +--
>  tools/perf/tests/hists_cumulate.c          |   8 +-
>  tools/perf/tests/hists_filter.c            |   8 +-
>  tools/perf/tests/hists_link.c              |   8 +-
>  tools/perf/tests/hists_output.c            |  10 +-
>  tools/perf/tests/sample-parsing.c          |  14 +++
>  tools/perf/util/dlfilter.c                 |   2 +-
>  tools/perf/util/event.h                    |   5 -
>  tools/perf/util/evsel.c                    |  17 ++-
>  tools/perf/util/hist.c                     |   4 +-
>  tools/perf/util/hist.h                     |   2 +-
>  tools/perf/util/intel-tpebs.c              |   4 +-
>  tools/perf/util/sample.h                   |   5 +-
>  tools/perf/util/session.c                  |   2 +-
>  tools/perf/util/sort.c                     |  67 +++++++----
>  tools/perf/util/sort.h                     |   5 +-
>  tools/perf/util/synthetic-events.c         |  10 +-
>  29 files changed, 150 insertions(+), 334 deletions(-)
>  delete mode 100644 tools/perf/arch/powerpc/util/event.c
>  delete mode 100644 tools/perf/arch/x86/tests/sample-parsing.c
>
> --
> 2.49.0.1143.g0be31eac6b-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-21 21:15     ` Ian Rogers
@ 2025-05-28 18:11       ` Namhyung Kim
  2025-05-28 18:27         ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2025-05-28 18:11 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 21, 2025 at 02:15:24PM -0700, Ian Rogers wrote:
> On Wed, May 21, 2025 at 1:27 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> > > By definition arch sample parsing and synthesis will inhibit certain
> > > kinds of cross-platform record then analysis (report, script,
> > > etc.). Remove arch_perf_parse_sample_weight and
> > > arch_perf_synthesize_sample_weight replacing with a common
> > > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > > capture the differing uses regardless of compiled for architecture.
> >
> > Can you please do this without renaming?  It can be a separate patch but
> > I think we can just leave it.
> 
> It is not clear what the use of the union is. Presumably it is a
> tagged union but there is no tag as the union value to use is implied
> by either being built on x86_64 or powerpc. The change removes the
> notion of this code being built for x86_64 or powerpc and so the union
> value to use isn't clear (e.g. should arm use p_stage_cyc or
> retire_lat from the union), hence combining to show that it could be
> one or the other. The code could be:
> ```
> #ifdef __x86_64__
>        u16 p_stage_cyc;
> #elif defined(powerpc)
>        u16 retire_lat;
> #endif
> ```
> but this isn't cross-platform.

Right, we probably don't want it.


> The change in hist.h of
> ```
> @@ -255,7 +255,7 @@ struct hist_entry {
>         u64                     code_page_size;
>         u64                     weight;
>         u64                     ins_lat;
> -       u64                     p_stage_cyc;
> +       u64                     p_stage_cyc_or_retire_lat;
> ```
> could be a follow up CL, but then we lose something of what the field
> is holding given the value is just a copy of that same named value in
> perf_sample. The code only inserts 34 lines and so the churn of doing
> that seemed worse than having the change in a single patch for
> clarity.

Assuming other archs can add something later, we won't rename the field
again.  So I can live with the ugly union fields.  If we really want to
rename it, I prefer calling it just 'weight3' and let the archs handle
the display name only.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-28 18:11       ` Namhyung Kim
@ 2025-05-28 18:27         ` Ian Rogers
  2025-05-28 20:08           ` Namhyung Kim
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-28 18:27 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 28, 2025 at 11:11 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, May 21, 2025 at 02:15:24PM -0700, Ian Rogers wrote:
> > On Wed, May 21, 2025 at 1:27 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> > > > By definition arch sample parsing and synthesis will inhibit certain
> > > > kinds of cross-platform record then analysis (report, script,
> > > > etc.). Remove arch_perf_parse_sample_weight and
> > > > arch_perf_synthesize_sample_weight replacing with a common
> > > > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > > > capture the differing uses regardless of compiled for architecture.
> > >
> > > Can you please do this without renaming?  It can be a separate patch but
> > > I think we can just leave it.
> >
> > It is not clear what the use of the union is. Presumably it is a
> > tagged union but there is no tag as the union value to use is implied
> > by either being built on x86_64 or powerpc. The change removes the
> > notion of this code being built for x86_64 or powerpc and so the union
> > value to use isn't clear (e.g. should arm use p_stage_cyc or
> > retire_lat from the union), hence combining to show that it could be
> > one or the other. The code could be:
> > ```
> > #ifdef __x86_64__
> >        u16 p_stage_cyc;
> > #elif defined(powerpc)
> >        u16 retire_lat;
> > #endif
> > ```
> > but this isn't cross-platform.
>
> Right, we probably don't want it.
>
>
> > The change in hist.h of
> > ```
> > @@ -255,7 +255,7 @@ struct hist_entry {
> >         u64                     code_page_size;
> >         u64                     weight;
> >         u64                     ins_lat;
> > -       u64                     p_stage_cyc;
> > +       u64                     p_stage_cyc_or_retire_lat;
> > ```
> > could be a follow up CL, but then we lose something of what the field
> > is holding given the value is just a copy of that same named value in
> > perf_sample. The code only inserts 34 lines and so the churn of doing
> > that seemed worse than having the change in a single patch for
> > clarity.
>
> Assuming other archs can add something later, we won't rename the field
> again.  So I can live with the ugly union fields.  If we really want to
> rename it, I prefer calling it just 'weight3' and let the archs handle
> the display name only.

But that's my point (or in other words maybe you've missed my point) .
Regardless of arch we should display p_stage_cyc if processing a
perf.data file from a PowerPC as determined from the perf_env in the
perf.data file, or retire_lat if processing a perf.data file from x86.
The arch of the perf build is entirely irrelevant and calling the
variable an opaque weight3 will require something that will need to be
disambiguate it elsewhere in the code. The goal in variable names
should be to be intention revealing, which I think
p_stage_cyc_or_retire_lat is doing better than weight3, which is
something of a regression from the existing but inaccurate
p_stage_cyc.

Thanks,
Ian

> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-28 18:27         ` Ian Rogers
@ 2025-05-28 20:08           ` Namhyung Kim
  2025-05-28 20:38             ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2025-05-28 20:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 28, 2025 at 11:27:06AM -0700, Ian Rogers wrote:
> On Wed, May 28, 2025 at 11:11 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, May 21, 2025 at 02:15:24PM -0700, Ian Rogers wrote:
> > > On Wed, May 21, 2025 at 1:27 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> > > > > By definition arch sample parsing and synthesis will inhibit certain
> > > > > kinds of cross-platform record then analysis (report, script,
> > > > > etc.). Remove arch_perf_parse_sample_weight and
> > > > > arch_perf_synthesize_sample_weight replacing with a common
> > > > > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > > > > capture the differing uses regardless of compiled for architecture.
> > > >
> > > > Can you please do this without renaming?  It can be a separate patch but
> > > > I think we can just leave it.
> > >
> > > It is not clear what the use of the union is. Presumably it is a
> > > tagged union but there is no tag as the union value to use is implied
> > > by either being built on x86_64 or powerpc. The change removes the
> > > notion of this code being built for x86_64 or powerpc and so the union
> > > value to use isn't clear (e.g. should arm use p_stage_cyc or
> > > retire_lat from the union), hence combining to show that it could be
> > > one or the other. The code could be:
> > > ```
> > > #ifdef __x86_64__
> > >        u16 p_stage_cyc;
> > > #elif defined(powerpc)
> > >        u16 retire_lat;
> > > #endif
> > > ```
> > > but this isn't cross-platform.
> >
> > Right, we probably don't want it.
> >
> >
> > > The change in hist.h of
> > > ```
> > > @@ -255,7 +255,7 @@ struct hist_entry {
> > >         u64                     code_page_size;
> > >         u64                     weight;
> > >         u64                     ins_lat;
> > > -       u64                     p_stage_cyc;
> > > +       u64                     p_stage_cyc_or_retire_lat;
> > > ```
> > > could be a follow up CL, but then we lose something of what the field
> > > is holding given the value is just a copy of that same named value in
> > > perf_sample. The code only inserts 34 lines and so the churn of doing
> > > that seemed worse than having the change in a single patch for
> > > clarity.
> >
> > Assuming other archs can add something later, we won't rename the field
> > again.  So I can live with the ugly union fields.  If we really want to
> > rename it, I prefer calling it just 'weight3' and let the archs handle
> > the display name only.
> 
> But that's my point (or in other words maybe you've missed my point) .
> Regardless of arch we should display p_stage_cyc if processing a
> perf.data file from a PowerPC as determined from the perf_env in the
> perf.data file, or retire_lat if processing a perf.data file from x86.

Agreed.


> The arch of the perf build is entirely irrelevant and calling the
> variable an opaque weight3 will require something that will need to be
> disambiguate it elsewhere in the code. The goal in variable names
> should be to be intention revealing, which I think
> p_stage_cyc_or_retire_lat is doing better than weight3, which is
> something of a regression from the existing but inaccurate
> p_stage_cyc.

Yeah, but I worried if it would end up with
'p_stage_cyc_or_retire_lat_or_something_else' later.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-28 20:08           ` Namhyung Kim
@ 2025-05-28 20:38             ` Ian Rogers
  2025-05-28 22:02               ` Namhyung Kim
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2025-05-28 20:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 28, 2025 at 1:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, May 28, 2025 at 11:27:06AM -0700, Ian Rogers wrote:
> > On Wed, May 28, 2025 at 11:11 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, May 21, 2025 at 02:15:24PM -0700, Ian Rogers wrote:
> > > > On Wed, May 21, 2025 at 1:27 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> > > > > > By definition arch sample parsing and synthesis will inhibit certain
> > > > > > kinds of cross-platform record then analysis (report, script,
> > > > > > etc.). Remove arch_perf_parse_sample_weight and
> > > > > > arch_perf_synthesize_sample_weight replacing with a common
> > > > > > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > > > > > capture the differing uses regardless of compiled for architecture.
> > > > >
> > > > > Can you please do this without renaming?  It can be a separate patch but
> > > > > I think we can just leave it.
> > > >
> > > > It is not clear what the use of the union is. Presumably it is a
> > > > tagged union but there is no tag as the union value to use is implied
> > > > by either being built on x86_64 or powerpc. The change removes the
> > > > notion of this code being built for x86_64 or powerpc and so the union
> > > > value to use isn't clear (e.g. should arm use p_stage_cyc or
> > > > retire_lat from the union), hence combining to show that it could be
> > > > one or the other. The code could be:
> > > > ```
> > > > #ifdef __x86_64__
> > > >        u16 p_stage_cyc;
> > > > #elif defined(powerpc)
> > > >        u16 retire_lat;
> > > > #endif
> > > > ```
> > > > but this isn't cross-platform.
> > >
> > > Right, we probably don't want it.
> > >
> > >
> > > > The change in hist.h of
> > > > ```
> > > > @@ -255,7 +255,7 @@ struct hist_entry {
> > > >         u64                     code_page_size;
> > > >         u64                     weight;
> > > >         u64                     ins_lat;
> > > > -       u64                     p_stage_cyc;
> > > > +       u64                     p_stage_cyc_or_retire_lat;
> > > > ```
> > > > could be a follow up CL, but then we lose something of what the field
> > > > is holding given the value is just a copy of that same named value in
> > > > perf_sample. The code only inserts 34 lines and so the churn of doing
> > > > that seemed worse than having the change in a single patch for
> > > > clarity.
> > >
> > > Assuming other archs can add something later, we won't rename the field
> > > again.  So I can live with the ugly union fields.  If we really want to
> > > rename it, I prefer calling it just 'weight3' and let the archs handle
> > > the display name only.
> >
> > But that's my point (or in other words maybe you've missed my point) .
> > Regardless of arch we should display p_stage_cyc if processing a
> > perf.data file from a PowerPC as determined from the perf_env in the
> > perf.data file, or retire_lat if processing a perf.data file from x86.
>
> Agreed.
>
>
> > The arch of the perf build is entirely irrelevant and calling the
> > variable an opaque weight3 will require something that will need to be
> > disambiguate it elsewhere in the code. The goal in variable names
> > should be to be intention revealing, which I think
> > p_stage_cyc_or_retire_lat is doing better than weight3, which is
> > something of a regression from the existing but inaccurate
> > p_stage_cyc.
>
> Yeah, but I worried if it would end up with
> 'p_stage_cyc_or_retire_lat_or_something_else' later.

Perhaps it should be:
```
union {
  u16 raw;
  u16 p_stage_cyc;
  u16 retire_lat;
} weight3;
```
to try to best capture this. `xyz.weight3.raw` when the PowerPC or x86
use isn't known, etc. In the histogram code the u16 is a u64.

Thanks,
Ian

> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing
  2025-05-28 20:38             ` Ian Rogers
@ 2025-05-28 22:02               ` Namhyung Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2025-05-28 22:02 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Masami Hiramatsu (Google), Ravi Bangoria,
	Leo Yan, Yujie Liu, Graham Woodward, Howard Chu, Weilin Wang,
	Dmitry Vyukov, Andi Kleen, Thomas Falcon, Matt Fleming,
	Chun-Tse Shao, Ben Gainey, Song Liu, linux-kernel,
	linux-perf-users, Kajol Jain, Athira Rajeev

On Wed, May 28, 2025 at 01:38:03PM -0700, Ian Rogers wrote:
> On Wed, May 28, 2025 at 1:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, May 28, 2025 at 11:27:06AM -0700, Ian Rogers wrote:
> > > On Wed, May 28, 2025 at 11:11 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, May 21, 2025 at 02:15:24PM -0700, Ian Rogers wrote:
> > > > > On Wed, May 21, 2025 at 1:27 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > On Wed, May 21, 2025 at 09:53:15AM -0700, Ian Rogers wrote:
> > > > > > > By definition arch sample parsing and synthesis will inhibit certain
> > > > > > > kinds of cross-platform record then analysis (report, script,
> > > > > > > etc.). Remove arch_perf_parse_sample_weight and
> > > > > > > arch_perf_synthesize_sample_weight replacing with a common
> > > > > > > implementation. Combine perf_sample p_stage_cyc and retire_lat to
> > > > > > > capture the differing uses regardless of compiled for architecture.
> > > > > >
> > > > > > Can you please do this without renaming?  It can be a separate patch but
> > > > > > I think we can just leave it.
> > > > >
> > > > > It is not clear what the use of the union is. Presumably it is a
> > > > > tagged union but there is no tag as the union value to use is implied
> > > > > by either being built on x86_64 or powerpc. The change removes the
> > > > > notion of this code being built for x86_64 or powerpc and so the union
> > > > > value to use isn't clear (e.g. should arm use p_stage_cyc or
> > > > > retire_lat from the union), hence combining to show that it could be
> > > > > one or the other. The code could be:
> > > > > ```
> > > > > #ifdef __x86_64__
> > > > >        u16 p_stage_cyc;
> > > > > #elif defined(powerpc)
> > > > >        u16 retire_lat;
> > > > > #endif
> > > > > ```
> > > > > but this isn't cross-platform.
> > > >
> > > > Right, we probably don't want it.
> > > >
> > > >
> > > > > The change in hist.h of
> > > > > ```
> > > > > @@ -255,7 +255,7 @@ struct hist_entry {
> > > > >         u64                     code_page_size;
> > > > >         u64                     weight;
> > > > >         u64                     ins_lat;
> > > > > -       u64                     p_stage_cyc;
> > > > > +       u64                     p_stage_cyc_or_retire_lat;
> > > > > ```
> > > > > could be a follow up CL, but then we lose something of what the field
> > > > > is holding given the value is just a copy of that same named value in
> > > > > perf_sample. The code only inserts 34 lines and so the churn of doing
> > > > > that seemed worse than having the change in a single patch for
> > > > > clarity.
> > > >
> > > > Assuming other archs can add something later, we won't rename the field
> > > > again.  So I can live with the ugly union fields.  If we really want to
> > > > rename it, I prefer calling it just 'weight3' and let the archs handle
> > > > the display name only.
> > >
> > > But that's my point (or in other words maybe you've missed my point) .
> > > Regardless of arch we should display p_stage_cyc if processing a
> > > perf.data file from a PowerPC as determined from the perf_env in the
> > > perf.data file, or retire_lat if processing a perf.data file from x86.
> >
> > Agreed.
> >
> >
> > > The arch of the perf build is entirely irrelevant and calling the
> > > variable an opaque weight3 will require something that will need to be
> > > disambiguate it elsewhere in the code. The goal in variable names
> > > should be to be intention revealing, which I think
> > > p_stage_cyc_or_retire_lat is doing better than weight3, which is
> > > something of a regression from the existing but inaccurate
> > > p_stage_cyc.
> >
> > Yeah, but I worried if it would end up with
> > 'p_stage_cyc_or_retire_lat_or_something_else' later.
> 
> Perhaps it should be:
> ```
> union {
>   u16 raw;
>   u16 p_stage_cyc;
>   u16 retire_lat;
> } weight3;
> ```
> to try to best capture this. `xyz.weight3.raw` when the PowerPC or x86
> use isn't known, etc.

Looks better.  But based on the offline discussion, it'd be better to
just use 'u16 weight3'.


> In the histogram code the u16 is a u64.

Yep, because histogram entry aggregates sample weights.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-28 22:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 16:53 [PATCH v3 0/3] Generic weight struct, use env for sort key and header Ian Rogers
2025-05-21 16:53 ` [PATCH v3 1/3] perf sample: Remove arch notion of sample parsing Ian Rogers
2025-05-21 20:26   ` Namhyung Kim
2025-05-21 21:15     ` Ian Rogers
2025-05-28 18:11       ` Namhyung Kim
2025-05-28 18:27         ` Ian Rogers
2025-05-28 20:08           ` Namhyung Kim
2025-05-28 20:38             ` Ian Rogers
2025-05-28 22:02               ` Namhyung Kim
2025-05-21 16:53 ` [PATCH v3 2/3] perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test Ian Rogers
2025-05-21 16:53 ` [PATCH v3 3/3] perf sort: Use perf_env to set arch sort keys and header Ian Rogers
2025-05-27 21:01 ` [PATCH v3 0/3] Generic weight struct, use env for sort key " Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).