linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] perf mem: Check mem_events for all eligible PMUs
@ 2024-09-05 17:07 kan.liang
  2024-09-05 17:07 ` [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL kan.liang
  2024-09-05 17:07 ` [PATCH 3/3] perf mem: Fix the wrong reference in parse_record_events kan.liang
  0 siblings, 2 replies; 10+ messages in thread
From: kan.liang @ 2024-09-05 17:07 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, jolsa, adrian.hunter, linux-perf-users, linux-kernel,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The current perf_pmu__mem_events_init() only checks the availability of
the mem_events for the first eligible PMU. It works for non-hybrid
machines and hybrid machines that have the same mem_events.

However, it may bring issues if a hybrid machine has a different
mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A
mem-loads-aux event is only required for the p-core. The mem_events on
both e-core and p-core should be checked and marked.

The issue was not found, because it's hidden by another bug, which only
records the mem-events for the e-core. The wrong check for the p-core
events didn't yell.

Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-c2c.c     |  2 +-
 tools/perf/builtin-mem.c     |  2 +-
 tools/perf/util/mem-events.c | 14 +++++++++++++-
 tools/perf/util/mem-events.h |  2 +-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index cd2bd573bfc3..cef95b2781c0 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3261,7 +3261,7 @@ static int perf_c2c__record(int argc, const char **argv)
 		return -1;
 	}
 
-	if (perf_pmu__mem_events_init(pmu)) {
+	if (perf_pmu__mem_events_init()) {
 		pr_err("failed: memory events not supported\n");
 		return -1;
 	}
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 18e5f9a0ddc2..f6be7ffc9e45 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -88,7 +88,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem,
 		return -1;
 	}
 
-	if (perf_pmu__mem_events_init(pmu)) {
+	if (perf_pmu__mem_events_init()) {
 		pr_err("failed: memory events not supported\n");
 		return -1;
 	}
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index be048bd02f36..17f80013e574 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -192,7 +192,7 @@ static bool perf_pmu__mem_events_supported(const char *mnt, struct perf_pmu *pmu
 	return !stat(path, &st);
 }
 
-int perf_pmu__mem_events_init(struct perf_pmu *pmu)
+static int __perf_pmu__mem_events_init(struct perf_pmu *pmu)
 {
 	const char *mnt = sysfs__mount();
 	bool found = false;
@@ -219,6 +219,18 @@ int perf_pmu__mem_events_init(struct perf_pmu *pmu)
 	return found ? 0 : -ENOENT;
 }
 
+int perf_pmu__mem_events_init(void)
+{
+	struct perf_pmu *pmu = NULL;
+
+	while ((pmu = perf_pmus__scan_mem(pmu)) != NULL) {
+		if (__perf_pmu__mem_events_init(pmu))
+			return -ENOENT;
+	}
+
+	return 0;
+}
+
 void perf_pmu__mem_events_list(struct perf_pmu *pmu)
 {
 	int j;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index ca31014d7934..a6fc2a593938 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -30,7 +30,7 @@ extern unsigned int perf_mem_events__loads_ldlat;
 extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
 
 int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str);
-int perf_pmu__mem_events_init(struct perf_pmu *pmu);
+int perf_pmu__mem_events_init(void);
 
 struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i);
 struct perf_pmu *perf_mem_events_find_pmu(void);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-05 17:07 [PATCH 1/3] perf mem: Check mem_events for all eligible PMUs kan.liang
@ 2024-09-05 17:07 ` kan.liang
  2024-09-05 19:33   ` Arnaldo Carvalho de Melo
  2024-09-05 17:07 ` [PATCH 3/3] perf mem: Fix the wrong reference in parse_record_events kan.liang
  1 sibling, 1 reply; 10+ messages in thread
From: kan.liang @ 2024-09-05 17:07 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, jolsa, adrian.hunter, linux-perf-users, linux-kernel,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The p-core mem events are missed when launching perf mem record on ADL
and RPL.

root@number:~# perf mem record sleep 1
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.032 MB perf.data ]
root@number:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u

A variable 'record' in the struct perf_mem_event is to indicate whether
a mem event in a mem_events[] should be recorded. The current code only
configure the variable for the first eligible PMU. It's good enough for
a non-hybrid machine or a hybrid machine which has the same
mem_events[]. However, if a different mem_events[] is used for different
PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second
PMU never get a chance to be set. The mem_events[] of the second PMU
are always ignored.

Perf mem doesn't support the per-PMU configuration now. A
per-PMU mem_events[] 'record' variable doesn't make sense. Make it
global. That could also avoid searching for the per-PMU mem_events[]
via perf_pmu__mem_events_ptr every time.

Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-c2c.c     | 12 ++++--------
 tools/perf/builtin-mem.c     | 17 ++++++-----------
 tools/perf/util/mem-events.c |  6 ++++--
 tools/perf/util/mem-events.h |  2 +-
 4 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index cef95b2781c0..15e1fce71c72 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3285,19 +3285,15 @@ static int perf_c2c__record(int argc, const char **argv)
 		 * PERF_MEM_EVENTS__LOAD_STORE if it is supported.
 		 */
 		if (e->tag) {
-			e->record = true;
+			perf_mem_record[PERF_MEM_EVENTS__LOAD_STORE] = true;
 			rec_argv[i++] = "-W";
 		} else {
-			e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
-			e->record = true;
-
-			e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__STORE);
-			e->record = true;
+			perf_mem_record[PERF_MEM_EVENTS__LOAD] = true;
+			perf_mem_record[PERF_MEM_EVENTS__STORE] = true;
 		}
 	}
 
-	e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
-	if (e->record)
+	if (perf_mem_record[PERF_MEM_EVENTS__LOAD])
 		rec_argv[i++] = "-W";
 
 	rec_argv[i++] = "-d";
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index f6be7ffc9e45..ba1d37bfb916 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -117,22 +117,17 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem,
 	if (e->tag &&
 	    (mem->operation & MEM_OPERATION_LOAD) &&
 	    (mem->operation & MEM_OPERATION_STORE)) {
-		e->record = true;
+		perf_mem_record[PERF_MEM_EVENTS__LOAD_STORE] = true;
 		rec_argv[i++] = "-W";
 	} else {
-		if (mem->operation & MEM_OPERATION_LOAD) {
-			e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
-			e->record = true;
-		}
+		if (mem->operation & MEM_OPERATION_LOAD)
+			perf_mem_record[PERF_MEM_EVENTS__LOAD] = true;
 
-		if (mem->operation & MEM_OPERATION_STORE) {
-			e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__STORE);
-			e->record = true;
-		}
+		if (mem->operation & MEM_OPERATION_STORE)
+			perf_mem_record[PERF_MEM_EVENTS__STORE] = true;
 	}
 
-	e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
-	if (e->record)
+	if (perf_mem_record[PERF_MEM_EVENTS__LOAD])
 		rec_argv[i++] = "-W";
 
 	rec_argv[i++] = "-d";
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 17f80013e574..051feb93ed8d 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -29,6 +29,8 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
 };
 #undef E
 
+bool perf_mem_record[PERF_MEM_EVENTS__MAX] = { 0 };
+
 static char mem_loads_name[100];
 static char mem_stores_name[100];
 
@@ -163,7 +165,7 @@ int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str)
 				continue;
 
 			if (strstr(e->tag, tok))
-				e->record = found = true;
+				perf_mem_record[j] = found = true;
 		}
 
 		tok = strtok_r(NULL, ",", &saveptr);
@@ -261,7 +263,7 @@ int perf_mem_events__record_args(const char **rec_argv, int *argv_nr)
 		for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 			e = perf_pmu__mem_events_ptr(pmu, j);
 
-			if (!e->record)
+			if (!perf_mem_record[j])
 				continue;
 
 			if (!e->supported) {
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index a6fc2a593938..8dc27db9fd52 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -6,7 +6,6 @@
 #include <linux/types.h>
 
 struct perf_mem_event {
-	bool		record;
 	bool		supported;
 	bool		ldlat;
 	u32		aux_event;
@@ -28,6 +27,7 @@ struct perf_pmu;
 
 extern unsigned int perf_mem_events__loads_ldlat;
 extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
+extern bool perf_mem_record[PERF_MEM_EVENTS__MAX];
 
 int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str);
 int perf_pmu__mem_events_init(void);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] perf mem: Fix the wrong reference in parse_record_events
  2024-09-05 17:07 [PATCH 1/3] perf mem: Check mem_events for all eligible PMUs kan.liang
  2024-09-05 17:07 ` [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL kan.liang
@ 2024-09-05 17:07 ` kan.liang
  1 sibling, 0 replies; 10+ messages in thread
From: kan.liang @ 2024-09-05 17:07 UTC (permalink / raw)
  To: acme, namhyung
  Cc: irogers, jolsa, adrian.hunter, linux-perf-users, linux-kernel,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A segmentation fault can be triggered when running
perf mem record -e ldlat-loads

The commit 35b38a71c92f ("perf mem: Rework command option handling")
moves the OPT_CALLBACK of event from __cmd_record() to cmd_mem().
When invoking the __cmd_record(), the 'mem' has been referenced (&).
So the &mem passed into the parse_record_events() is a double
reference (&&) of the original struct perf_mem mem.
But in the cmd_mem(), the &mem is the single reference (&) of the
original struct perf_mem mem.

Fixes: 35b38a71c92f ("perf mem: Rework command option handling")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index ba1d37bfb916..2b28e356f963 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -47,7 +47,7 @@ struct perf_mem {
 static int parse_record_events(const struct option *opt,
 			       const char *str, int unset __maybe_unused)
 {
-	struct perf_mem *mem = *(struct perf_mem **)opt->value;
+	struct perf_mem *mem = (struct perf_mem *)opt->value;
 	struct perf_pmu *pmu;
 
 	pmu = perf_mem_events_find_pmu();
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-05 17:07 ` [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL kan.liang
@ 2024-09-05 19:33   ` Arnaldo Carvalho de Melo
  2024-09-05 19:47     ` Liang, Kan
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-09-05 19:33 UTC (permalink / raw)
  To: kan.liang
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel

On Thu, Sep 05, 2024 at 10:07:36AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The p-core mem events are missed when launching perf mem record on ADL
> and RPL.
> 
> root@number:~# perf mem record sleep 1
> Memory events are enabled on a subset of CPUs: 16-27
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.032 MB perf.data ]
> root@number:~# perf evlist
> cpu_atom/mem-loads,ldlat=30/P
> cpu_atom/mem-stores/P
> dummy:u
> 
> A variable 'record' in the struct perf_mem_event is to indicate whether
> a mem event in a mem_events[] should be recorded. The current code only
> configure the variable for the first eligible PMU. It's good enough for
> a non-hybrid machine or a hybrid machine which has the same
> mem_events[]. However, if a different mem_events[] is used for different
> PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second
> PMU never get a chance to be set. The mem_events[] of the second PMU
> are always ignored.
> 
> Perf mem doesn't support the per-PMU configuration now. A
> per-PMU mem_events[] 'record' variable doesn't make sense. Make it
> global. That could also avoid searching for the per-PMU mem_events[]
> via perf_pmu__mem_events_ptr every time.
> 
> Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

Looks better:

root@number:~# perf report --header-only | grep 'cmdline\|event'
# cmdline : /home/acme/bin/perf mem record ls 
# event : name = cpu_atom/mem-loads,ldlat=30/P, , id = { 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511 }, type = 10 (cpu_atom), size = 136, config = 0x5d0 (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, { bp_addr, config1 } = 0x1f
# event : name = cpu_atom/mem-stores/P, , id = { 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523 }, type = 10 (cpu_atom), size = 136, config = 0x6d0 (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
# event : name = cpu_core/mem-loads-aux/, , id = { 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539 }, type = 4 (cpu_core), size = 136, config = 0x8203 (mem-loads-aux), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1
# event : name = cpu_core/mem-loads,ldlat=30/, , id = { 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556 }, type = 4 (cpu_core), size = 136, config = 0x1cd (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1, exclude_guest = 1, { bp_addr, config1 } = 0x1f
# event : name = cpu_core/mem-stores/P, , id = { 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572 }, type = 4 (cpu_core), size = 136, config = 0x2cd (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
# event : name = dummy:u, , id = { 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600 }, type = 1 (software), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1, task = 1, mmap_data = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# intel_pt pmu capabilities: topa_multiple_entries=1, psb_cyc=1, single_range_output=1, mtc_periods=249, ip_filtering=1, output_subsys=0, cr3_filtering=1, psb_periods=3f, event_trace=0, cycle_thresholds=3f, power_event_trace=0, mtc=1, payloads_lip=0, ptwrite=1, num_address_ranges=2, max_subleaf=1, topa_output=1, tnt_disable=0
root@number:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
cpu_core/mem-loads-aux/
cpu_core/mem-loads,ldlat=30/
cpu_core/mem-stores/P
dummy:u
root@number:~#

But can we reconstruct the events relationship (group, :S, etc) from
what we have in the perf.data header?

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-05 19:33   ` Arnaldo Carvalho de Melo
@ 2024-09-05 19:47     ` Liang, Kan
  2024-09-06 14:17       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Liang, Kan @ 2024-09-05 19:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel



On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
> On Thu, Sep 05, 2024 at 10:07:36AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The p-core mem events are missed when launching perf mem record on ADL
>> and RPL.
>>
>> root@number:~# perf mem record sleep 1
>> Memory events are enabled on a subset of CPUs: 16-27
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.032 MB perf.data ]
>> root@number:~# perf evlist
>> cpu_atom/mem-loads,ldlat=30/P
>> cpu_atom/mem-stores/P
>> dummy:u
>>
>> A variable 'record' in the struct perf_mem_event is to indicate whether
>> a mem event in a mem_events[] should be recorded. The current code only
>> configure the variable for the first eligible PMU. It's good enough for
>> a non-hybrid machine or a hybrid machine which has the same
>> mem_events[]. However, if a different mem_events[] is used for different
>> PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second
>> PMU never get a chance to be set. The mem_events[] of the second PMU
>> are always ignored.
>>
>> Perf mem doesn't support the per-PMU configuration now. A
>> per-PMU mem_events[] 'record' variable doesn't make sense. Make it
>> global. That could also avoid searching for the per-PMU mem_events[]
>> via perf_pmu__mem_events_ptr every time.
>>
>> Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
>> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> 
> Looks better:
> 
> root@number:~# perf report --header-only | grep 'cmdline\|event'
> # cmdline : /home/acme/bin/perf mem record ls 
> # event : name = cpu_atom/mem-loads,ldlat=30/P, , id = { 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511 }, type = 10 (cpu_atom), size = 136, config = 0x5d0 (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, { bp_addr, config1 } = 0x1f
> # event : name = cpu_atom/mem-stores/P, , id = { 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523 }, type = 10 (cpu_atom), size = 136, config = 0x6d0 (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
> # event : name = cpu_core/mem-loads-aux/, , id = { 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539 }, type = 4 (cpu_core), size = 136, config = 0x8203 (mem-loads-aux), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1
> # event : name = cpu_core/mem-loads,ldlat=30/, , id = { 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556 }, type = 4 (cpu_core), size = 136, config = 0x1cd (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1, exclude_guest = 1, { bp_addr, config1 } = 0x1f
> # event : name = cpu_core/mem-stores/P, , id = { 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572 }, type = 4 (cpu_core), size = 136, config = 0x2cd (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
> # event : name = dummy:u, , id = { 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600 }, type = 1 (software), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1, task = 1, mmap_data = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
> # intel_pt pmu capabilities: topa_multiple_entries=1, psb_cyc=1, single_range_output=1, mtc_periods=249, ip_filtering=1, output_subsys=0, cr3_filtering=1, psb_periods=3f, event_trace=0, cycle_thresholds=3f, power_event_trace=0, mtc=1, payloads_lip=0, ptwrite=1, num_address_ranges=2, max_subleaf=1, topa_output=1, tnt_disable=0
> root@number:~# perf evlist
> cpu_atom/mem-loads,ldlat=30/P
> cpu_atom/mem-stores/P
> cpu_core/mem-loads-aux/
> cpu_core/mem-loads,ldlat=30/
> cpu_core/mem-stores/P
> dummy:u
> root@number:~#
> 
> But can we reconstruct the events relationship (group, :S, etc) from
> what we have in the perf.data header?
> 

Do you mean show the group relation in the perf evlist?

$perf mem record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]

$perf evlist -g
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
cpu_core/mem-stores/P
dummy:u

The -g option already did it, although the group modifier looks lost.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-05 19:47     ` Liang, Kan
@ 2024-09-06 14:17       ` Arnaldo Carvalho de Melo
  2024-09-06 16:08         ` Liang, Kan
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-09-06 14:17 UTC (permalink / raw)
  To: Liang, Kan
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel

On Thu, Sep 05, 2024 at 03:47:03PM -0400, Liang, Kan wrote:
> On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
> > On Thu, Sep 05, 2024 at 10:07:36AM -0700, kan.liang@linux.intel.com wrote:
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> The p-core mem events are missed when launching perf mem record on ADL
> >> and RPL.
> >>
> >> root@number:~# perf mem record sleep 1
> >> Memory events are enabled on a subset of CPUs: 16-27
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.032 MB perf.data ]
> >> root@number:~# perf evlist
> >> cpu_atom/mem-loads,ldlat=30/P
> >> cpu_atom/mem-stores/P
> >> dummy:u
> >>
> >> A variable 'record' in the struct perf_mem_event is to indicate whether
> >> a mem event in a mem_events[] should be recorded. The current code only
> >> configure the variable for the first eligible PMU. It's good enough for
> >> a non-hybrid machine or a hybrid machine which has the same
> >> mem_events[]. However, if a different mem_events[] is used for different
> >> PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second
> >> PMU never get a chance to be set. The mem_events[] of the second PMU
> >> are always ignored.
> >>
> >> Perf mem doesn't support the per-PMU configuration now. A
> >> per-PMU mem_events[] 'record' variable doesn't make sense. Make it
> >> global. That could also avoid searching for the per-PMU mem_events[]
> >> via perf_pmu__mem_events_ptr every time.
> >>
> >> Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
> >> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> >> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
> >> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> > 
> > Looks better:
> > 
> > root@number:~# perf report --header-only | grep 'cmdline\|event'
> > # cmdline : /home/acme/bin/perf mem record ls 
> > # event : name = cpu_atom/mem-loads,ldlat=30/P, , id = { 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511 }, type = 10 (cpu_atom), size = 136, config = 0x5d0 (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, { bp_addr, config1 } = 0x1f
> > # event : name = cpu_atom/mem-stores/P, , id = { 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523 }, type = 10 (cpu_atom), size = 136, config = 0x6d0 (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
> > # event : name = cpu_core/mem-loads-aux/, , id = { 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539 }, type = 4 (cpu_core), size = 136, config = 0x8203 (mem-loads-aux), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1
> > # event : name = cpu_core/mem-loads,ldlat=30/, , id = { 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556 }, type = 4 (cpu_core), size = 136, config = 0x1cd (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1, exclude_guest = 1, { bp_addr, config1 } = 0x1f
> > # event : name = cpu_core/mem-stores/P, , id = { 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572 }, type = 4 (cpu_core), size = 136, config = 0x2cd (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
> > # event : name = dummy:u, , id = { 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600 }, type = 1 (software), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1, task = 1, mmap_data = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
> > # intel_pt pmu capabilities: topa_multiple_entries=1, psb_cyc=1, single_range_output=1, mtc_periods=249, ip_filtering=1, output_subsys=0, cr3_filtering=1, psb_periods=3f, event_trace=0, cycle_thresholds=3f, power_event_trace=0, mtc=1, payloads_lip=0, ptwrite=1, num_address_ranges=2, max_subleaf=1, topa_output=1, tnt_disable=0
> > root@number:~# perf evlist
> > cpu_atom/mem-loads,ldlat=30/P
> > cpu_atom/mem-stores/P
> > cpu_core/mem-loads-aux/
> > cpu_core/mem-loads,ldlat=30/
> > cpu_core/mem-stores/P
> > dummy:u
> > root@number:~#
> > 
> > But can we reconstruct the events relationship (group, :S, etc) from
> > what we have in the perf.data header?
> > 
> 
> Do you mean show the group relation in the perf evlist?
> 
> $perf mem record sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]
> 
> $perf evlist -g
> cpu_atom/mem-loads,ldlat=30/P
> cpu_atom/mem-stores/P
> {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
> cpu_core/mem-stores/P
> dummy:u
> 
> The -g option already did it, although the group modifier looks lost.

Right, I can reproduce that, but I wonder if we shouldn't make this '-g'
option the default?

-----

Committer testing:

  root@number:~# perf evlist -g
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
  cpu_core/mem-stores/P
  dummy:u
  root@number:~#

The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
not being added by 'perf evlist -g', to be checked.

-----

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-06 14:17       ` Arnaldo Carvalho de Melo
@ 2024-09-06 16:08         ` Liang, Kan
  2024-09-06 20:06           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Liang, Kan @ 2024-09-06 16:08 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel



On 2024-09-06 10:17 a.m., Arnaldo Carvalho de Melo wrote:
> On Thu, Sep 05, 2024 at 03:47:03PM -0400, Liang, Kan wrote:
>> On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
>>> On Thu, Sep 05, 2024 at 10:07:36AM -0700, kan.liang@linux.intel.com wrote:
>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>
>>>> The p-core mem events are missed when launching perf mem record on ADL
>>>> and RPL.
>>>>
>>>> root@number:~# perf mem record sleep 1
>>>> Memory events are enabled on a subset of CPUs: 16-27
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.032 MB perf.data ]
>>>> root@number:~# perf evlist
>>>> cpu_atom/mem-loads,ldlat=30/P
>>>> cpu_atom/mem-stores/P
>>>> dummy:u
>>>>
>>>> A variable 'record' in the struct perf_mem_event is to indicate whether
>>>> a mem event in a mem_events[] should be recorded. The current code only
>>>> configure the variable for the first eligible PMU. It's good enough for
>>>> a non-hybrid machine or a hybrid machine which has the same
>>>> mem_events[]. However, if a different mem_events[] is used for different
>>>> PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second
>>>> PMU never get a chance to be set. The mem_events[] of the second PMU
>>>> are always ignored.
>>>>
>>>> Perf mem doesn't support the per-PMU configuration now. A
>>>> per-PMU mem_events[] 'record' variable doesn't make sense. Make it
>>>> global. That could also avoid searching for the per-PMU mem_events[]
>>>> via perf_pmu__mem_events_ptr every time.
>>>>
>>>> Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
>>>> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
>>>> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> Looks better:
>>>
>>> root@number:~# perf report --header-only | grep 'cmdline\|event'
>>> # cmdline : /home/acme/bin/perf mem record ls 
>>> # event : name = cpu_atom/mem-loads,ldlat=30/P, , id = { 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511 }, type = 10 (cpu_atom), size = 136, config = 0x5d0 (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, { bp_addr, config1 } = 0x1f
>>> # event : name = cpu_atom/mem-stores/P, , id = { 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523 }, type = 10 (cpu_atom), size = 136, config = 0x6d0 (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
>>> # event : name = cpu_core/mem-loads-aux/, , id = { 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539 }, type = 4 (cpu_core), size = 136, config = 0x8203 (mem-loads-aux), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1
>>> # event : name = cpu_core/mem-loads,ldlat=30/, , id = { 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556 }, type = 4 (cpu_core), size = 136, config = 0x1cd (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1, exclude_guest = 1, { bp_addr, config1 } = 0x1f
>>> # event : name = cpu_core/mem-stores/P, , id = { 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572 }, type = 4 (cpu_core), size = 136, config = 0x2cd (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
>>> # event : name = dummy:u, , id = { 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600 }, type = 1 (software), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1, task = 1, mmap_data = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
>>> # intel_pt pmu capabilities: topa_multiple_entries=1, psb_cyc=1, single_range_output=1, mtc_periods=249, ip_filtering=1, output_subsys=0, cr3_filtering=1, psb_periods=3f, event_trace=0, cycle_thresholds=3f, power_event_trace=0, mtc=1, payloads_lip=0, ptwrite=1, num_address_ranges=2, max_subleaf=1, topa_output=1, tnt_disable=0
>>> root@number:~# perf evlist
>>> cpu_atom/mem-loads,ldlat=30/P
>>> cpu_atom/mem-stores/P
>>> cpu_core/mem-loads-aux/
>>> cpu_core/mem-loads,ldlat=30/
>>> cpu_core/mem-stores/P
>>> dummy:u
>>> root@number:~#
>>>
>>> But can we reconstruct the events relationship (group, :S, etc) from
>>> what we have in the perf.data header?
>>>
>>
>> Do you mean show the group relation in the perf evlist?
>>
>> $perf mem record sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]
>>
>> $perf evlist -g
>> cpu_atom/mem-loads,ldlat=30/P
>> cpu_atom/mem-stores/P
>> {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
>> cpu_core/mem-stores/P
>> dummy:u
>>
>> The -g option already did it, although the group modifier looks lost.
> 
> Right, I can reproduce that, but I wonder if we shouldn't make this '-g'
> option the default?

I think the evlist means a list of events. Only outputting the events
makes sense to me.
With -g, the extra relationship information is provided.

> 
> -----
> 
> Committer testing:
> 
>   root@number:~# perf evlist -g
>   cpu_atom/mem-loads,ldlat=30/P
>   cpu_atom/mem-stores/P
>   {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
>   cpu_core/mem-stores/P
>   dummy:u
>   root@number:~#
> 
> The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
> not being added by 'perf evlist -g', to be checked.
> 
> -----

It should be a generic issue, not just for perf evlist -g.

The same issue can be observed for perf report.
$perf report --header-only | grep 'cmdline\|group'
# cmdline : /home/kan/tmp/perf-tools-next/tools/perf/perf record -e
{cycles,instructions}:u sleep 1
# group: {cycles,instructions}

I think it's because the per-group modifiers is converted to per-event
modifiers and stored in the evsel when parsing the group. It's hard to
reconstruct the accurate group strings only relying on the evsel, unless
we record the group string somewhere, e.g., leader evsel, when parsing it.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-06 16:08         ` Liang, Kan
@ 2024-09-06 20:06           ` Arnaldo Carvalho de Melo
  2024-09-08 20:30             ` Liang, Kan
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-09-06 20:06 UTC (permalink / raw)
  To: Liang, Kan
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel

On Fri, Sep 06, 2024 at 12:08:52PM -0400, Liang, Kan wrote:
> On 2024-09-06 10:17 a.m., Arnaldo Carvalho de Melo wrote:
> > On Thu, Sep 05, 2024 at 03:47:03PM -0400, Liang, Kan wrote:
> >> On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
> >>> But can we reconstruct the events relationship (group, :S, etc) from
> >>> what we have in the perf.data header?

> >> Do you mean show the group relation in the perf evlist?

> >> $perf mem record sleep 1
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]

> >> $perf evlist -g
> >> cpu_atom/mem-loads,ldlat=30/P
> >> cpu_atom/mem-stores/P
> >> {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
> >> cpu_core/mem-stores/P
> >> dummy:u

> >> The -g option already did it, although the group modifier looks lost.

> > Right, I can reproduce that, but I wonder if we shouldn't make this '-g'
> > option the default?

> I think the evlist means a list of events. Only outputting the events
> makes sense to me.
> With -g, the extra relationship information is provided.

At first 'perf evlist' showing just the events present in the perf.data
file seems enough, and maybe it should continue like that.

It is just that this relationship is so critical that not showing it by
default looks suboptimal :-\

Perhaps we should add some warning at the end mentioning the special
relationships present and suggest using '-g' to see it?

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-06 20:06           ` Arnaldo Carvalho de Melo
@ 2024-09-08 20:30             ` Liang, Kan
  2024-09-11 15:56               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Liang, Kan @ 2024-09-08 20:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel



On 2024-09-06 4:06 p.m., Arnaldo Carvalho de Melo wrote:
> On Fri, Sep 06, 2024 at 12:08:52PM -0400, Liang, Kan wrote:
>> On 2024-09-06 10:17 a.m., Arnaldo Carvalho de Melo wrote:
>>> On Thu, Sep 05, 2024 at 03:47:03PM -0400, Liang, Kan wrote:
>>>> On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
>>>>> But can we reconstruct the events relationship (group, :S, etc) from
>>>>> what we have in the perf.data header?
> 
>>>> Do you mean show the group relation in the perf evlist?
> 
>>>> $perf mem record sleep 1
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]
> 
>>>> $perf evlist -g
>>>> cpu_atom/mem-loads,ldlat=30/P
>>>> cpu_atom/mem-stores/P
>>>> {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
>>>> cpu_core/mem-stores/P
>>>> dummy:u
> 
>>>> The -g option already did it, although the group modifier looks lost.
> 
>>> Right, I can reproduce that, but I wonder if we shouldn't make this '-g'
>>> option the default?
> 
>> I think the evlist means a list of events. Only outputting the events
>> makes sense to me.
>> With -g, the extra relationship information is provided.
> 
> At first 'perf evlist' showing just the events present in the perf.data
> file seems enough, and maybe it should continue like that.
> 
> It is just that this relationship is so critical that not showing it by
> default looks suboptimal :-\
> 
> Perhaps we should add some warning at the end mentioning the special
> relationships present and suggest using '-g' to see it?
> 

Agree, and we already did a similar hint for tracepoint events.

Here is the patch to add a hint for '-g'.
https://lore.kernel.org/lkml/20240908202847.176280-1-kan.liang@linux.intel.com/

Thanks,
Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL
  2024-09-08 20:30             ` Liang, Kan
@ 2024-09-11 15:56               ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-09-11 15:56 UTC (permalink / raw)
  To: Liang, Kan
  Cc: namhyung, irogers, jolsa, adrian.hunter, linux-perf-users,
	linux-kernel

On Sun, Sep 08, 2024 at 04:30:44PM -0400, Liang, Kan wrote:
> 
> 
> On 2024-09-06 4:06 p.m., Arnaldo Carvalho de Melo wrote:
> > On Fri, Sep 06, 2024 at 12:08:52PM -0400, Liang, Kan wrote:
> >> On 2024-09-06 10:17 a.m., Arnaldo Carvalho de Melo wrote:
> >>> On Thu, Sep 05, 2024 at 03:47:03PM -0400, Liang, Kan wrote:
> >>>> On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
> >>>>> But can we reconstruct the events relationship (group, :S, etc) from
> >>>>> what we have in the perf.data header?
> > 
> >>>> Do you mean show the group relation in the perf evlist?
> > 
> >>>> $perf mem record sleep 1
> >>>> [ perf record: Woken up 1 times to write data ]
> >>>> [ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]
> > 
> >>>> $perf evlist -g
> >>>> cpu_atom/mem-loads,ldlat=30/P
> >>>> cpu_atom/mem-stores/P
> >>>> {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
> >>>> cpu_core/mem-stores/P
> >>>> dummy:u
> > 
> >>>> The -g option already did it, although the group modifier looks lost.
> > 
> >>> Right, I can reproduce that, but I wonder if we shouldn't make this '-g'
> >>> option the default?
> > 
> >> I think the evlist means a list of events. Only outputting the events
> >> makes sense to me.
> >> With -g, the extra relationship information is provided.
> > 
> > At first 'perf evlist' showing just the events present in the perf.data
> > file seems enough, and maybe it should continue like that.
> > 
> > It is just that this relationship is so critical that not showing it by
> > default looks suboptimal :-\
> > 
> > Perhaps we should add some warning at the end mentioning the special
> > relationships present and suggest using '-g' to see it?
> > 
> 
> Agree, and we already did a similar hint for tracepoint events.
> 
> Here is the patch to add a hint for '-g'.
> https://lore.kernel.org/lkml/20240908202847.176280-1-kan.liang@linux.intel.com/

Thanks for doing that, patch applied!

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-09-11 15:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-05 17:07 [PATCH 1/3] perf mem: Check mem_events for all eligible PMUs kan.liang
2024-09-05 17:07 ` [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL kan.liang
2024-09-05 19:33   ` Arnaldo Carvalho de Melo
2024-09-05 19:47     ` Liang, Kan
2024-09-06 14:17       ` Arnaldo Carvalho de Melo
2024-09-06 16:08         ` Liang, Kan
2024-09-06 20:06           ` Arnaldo Carvalho de Melo
2024-09-08 20:30             ` Liang, Kan
2024-09-11 15:56               ` Arnaldo Carvalho de Melo
2024-09-05 17:07 ` [PATCH 3/3] perf mem: Fix the wrong reference in parse_record_events kan.liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).