* [PATCH v4] perf script python: Add the ins_lat field to event handler
@ 2024-08-09 8:01 Zixian Cai
2024-08-09 13:36 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 3+ messages in thread
From: Zixian Cai @ 2024-08-09 8:01 UTC (permalink / raw)
Cc: Zixian Cai, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Liang, Kan, Ben Gainey,
Paran Lee, linux-perf-users, linux-kernel
For example, when using the Alder Lake PMU memory load event, the
instruction latency is stored in ins_lat, while the cache latency
is stored in weight.
This patch reports the ins_lat field for Python scripting.
Signed-off-by: Zixian Cai <fzczx123@gmail.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
v4) reformat commit message for max line length
v3) address review comments
v2) rebase on top of perf-tools-next
tools/perf/util/scripting-engines/trace-event-python.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index fb00f3ad6815..6971dd6c231f 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -888,6 +888,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
set_sample_read_in_dict(dict_sample, sample, evsel);
pydict_set_item_string_decref(dict_sample, "weight",
PyLong_FromUnsignedLongLong(sample->weight));
+ pydict_set_item_string_decref(dict_sample, "ins_lat",
+ PyLong_FromUnsignedLong(sample->ins_lat));
pydict_set_item_string_decref(dict_sample, "transaction",
PyLong_FromUnsignedLongLong(sample->transaction));
set_sample_datasrc_in_dict(dict_sample, sample);
@@ -1317,7 +1319,7 @@ static void python_export_sample_table(struct db_export *dbe,
struct tables *tables = container_of(dbe, struct tables, dbe);
PyObject *t;
- t = tuple_new(27);
+ t = tuple_new(28);
tuple_set_d64(t, 0, es->db_id);
tuple_set_d64(t, 1, es->evsel->db_id);
@@ -1346,6 +1348,7 @@ static void python_export_sample_table(struct db_export *dbe,
tuple_set_s32(t, 24, es->sample->flags);
tuple_set_d64(t, 25, es->sample->id);
tuple_set_d64(t, 26, es->sample->stream_id);
+ tuple_set_u32(t, 27, es->sample->ins_lat);
call_object(tables->sample_handler, t, "sample_table");
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v4] perf script python: Add the ins_lat field to event handler
2024-08-09 8:01 [PATCH v4] perf script python: Add the ins_lat field to event handler Zixian Cai
@ 2024-08-09 13:36 ` Arnaldo Carvalho de Melo
2024-08-11 3:52 ` Zixian Cai
0 siblings, 1 reply; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-09 13:36 UTC (permalink / raw)
To: Zixian Cai
Cc: Adrian Hunter, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Liang, Kan, Ben Gainey, Paran Lee, linux-perf-users, linux-kernel
On Fri, Aug 09, 2024 at 08:01:36AM +0000, Zixian Cai wrote:
> For example, when using the Alder Lake PMU memory load event, the
> instruction latency is stored in ins_lat, while the cache latency
> is stored in weight.
>
> This patch reports the ins_lat field for Python scripting.
So, how did you test this? I tried:
Committer testing:
On a Rocket Lake Refresh Intel machine (14th gen):
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Core(TM) i7-14700K
root@number:~# perf mem record -a sleep 5
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 85 times to write data ]
[ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
root@number:~# perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~#
Now generate a python script to then dump the dictionary that now needs
to have that 'ins_lat' field:
root@number:~# perf script --gen python
generated Python script: perf-script.py
root@number:~# vim perf-script.py
root@number:~# perf script -s perf-script.py | head -40
in trace_begin
in trace_end
root@number:~# vim perf-script.py
But now the perf-script.py doesn't have a handler for the events and I
got just:
root@number:~# perf script -s perf-script.py
in trace_begin
in trace_end
root@number:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@number:~# perf report -D | grep PERF_RECORD_SAMPLE | wc -l
5857
root@number:~#
So now I'm investigating if this is some 'perf script' script generation
oddity by trying to run this on an AMD machine, non-hybrid...
But in general try to provide the steps to show that the functionality
that you are adding is actually working, making it easy for other
people to try reproducing your results.
Thanks,
- Arnaldo
> Signed-off-by: Zixian Cai <fzczx123@gmail.com>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> v4) reformat commit message for max line length
> v3) address review comments
> v2) rebase on top of perf-tools-next
>
> tools/perf/util/scripting-engines/trace-event-python.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index fb00f3ad6815..6971dd6c231f 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -888,6 +888,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
> set_sample_read_in_dict(dict_sample, sample, evsel);
> pydict_set_item_string_decref(dict_sample, "weight",
> PyLong_FromUnsignedLongLong(sample->weight));
> + pydict_set_item_string_decref(dict_sample, "ins_lat",
> + PyLong_FromUnsignedLong(sample->ins_lat));
> pydict_set_item_string_decref(dict_sample, "transaction",
> PyLong_FromUnsignedLongLong(sample->transaction));
> set_sample_datasrc_in_dict(dict_sample, sample);
> @@ -1317,7 +1319,7 @@ static void python_export_sample_table(struct db_export *dbe,
> struct tables *tables = container_of(dbe, struct tables, dbe);
> PyObject *t;
>
> - t = tuple_new(27);
> + t = tuple_new(28);
>
> tuple_set_d64(t, 0, es->db_id);
> tuple_set_d64(t, 1, es->evsel->db_id);
> @@ -1346,6 +1348,7 @@ static void python_export_sample_table(struct db_export *dbe,
> tuple_set_s32(t, 24, es->sample->flags);
> tuple_set_d64(t, 25, es->sample->id);
> tuple_set_d64(t, 26, es->sample->stream_id);
> + tuple_set_u32(t, 27, es->sample->ins_lat);
>
> call_object(tables->sample_handler, t, "sample_table");
>
> --
> 2.25.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v4] perf script python: Add the ins_lat field to event handler
2024-08-09 13:36 ` Arnaldo Carvalho de Melo
@ 2024-08-11 3:52 ` Zixian Cai
0 siblings, 0 replies; 3+ messages in thread
From: Zixian Cai @ 2024-08-11 3:52 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Adrian Hunter, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Liang, Kan, Ben Gainey, Paran Lee, linux-perf-users, linux-kernel
On 9/8/2024 23:36, Arnaldo Carvalho de Melo wrote:
> On Fri, Aug 09, 2024 at 08:01:36AM +0000, Zixian Cai wrote:
>> For example, when using the Alder Lake PMU memory load event, the
>> instruction latency is stored in ins_lat, while the cache latency
>> is stored in weight.
>>
>> This patch reports the ins_lat field for Python scripting.
>
> So, how did you test this? I tried:
This is how I tested it.
My machine is running 6.5.0-41-generic from Ubuntu 22.04 LTS, and I use OS's perf to record.
$ grep -m1 'model name' /proc/cpuinfo
model name : 12th Gen Intel(R) Core(TM) i9-12900KF
$ perf version
perf version 6.5.13
$ perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava
...
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7988 msec =====
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 3.530 MB perf.data (47646 samples) ]
$ ./perf evlist -v
cpu_core/mem-loads-aux/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x8203, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1
cpu_core/mem-loads,ldlat=30/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, freq: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-loads,ldlat=30/P: type: 10, size: 136, config: 0x5d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_core/mem-stores/P: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x2cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
cpu_atom/mem-stores/P: type: 10, size: 136, config: 0x6d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:HG: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
$ ./perf script -g python
Add a new method to perf-script.py
def process_event(params):
if "cpu_core/mem-loads,ldlat" in params["ev_name"]:
print(params["sample"]["weight"], params["sample"]["ins_lat"])
$ ./perf script|grep ldlat=|head
taskset 182628 247517.778385: 1 cpu_core/mem-loads,ldlat=30/: ffffb33a850078a0 40268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK Addr 5 33 0 ffffffff8cc2ba08 [unknown] ([unknown])
taskset 182628 247517.778409: 1 cpu_core/mem-loads,ldlat=30/: ffffb33a85007860 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 85 0 ffffffff8ce23476 [unknown] ([unknown])
taskset 182628 247517.778431: 3 cpu_core/mem-loads,ldlat=30/: ffffb33a85007b78 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 163 0 ffffffff8d2061d0 [unknown] ([unknown])
taskset 182628 247517.778444: 7 cpu_core/mem-loads,ldlat=30/: ffff90cf25b26280 10668100842 |OP LOAD|LVL L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 96 120 0 ffffffff8dab2627 [unknown] ([unknown])
taskset 182628 247517.778484: 23 cpu_core/mem-loads,ldlat=30/: ffffb33a85007cf0 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 218 0 ffffffff8cd96124 [unknown] ([unknown])
taskset 182628 247517.778561: 39 cpu_core/mem-loads,ldlat=30/: ffffe271848b6600 20268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK Data 5 111 0 ffffffff8cd948cc [unknown] ([unknown])
taskset 182628 247517.778629: 50 cpu_core/mem-loads,ldlat=30/: ffffe27184b6d280 11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 71 73 0 ffffffff8cd94792 [unknown] ([unknown])
taskset 182628 247517.778725: 67 cpu_core/mem-loads,ldlat=30/: ffff90c061ed6b48 11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 240 242 0 ffffffff8cf9785b [unknown] ([unknown])
java 182628 247517.778886: 81 cpu_core/mem-loads,ldlat=30/: ffffe27184888430 4026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK Addr 5 68 0 ffffffff8ce13245 [unknown] ([unknown])
java 182628 247517.779164: 87 cpu_core/mem-loads,ldlat=30/: ffffe271bf9bca40 1026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK N/A 5 90 0 ffffffff8cd96387 [unknown] ([unknown])
$ ./perf script -s perf-script.py|head
in trace_begin
5 33
5 85
5 163
96 120
5 218
5 111
71 73
240 242
5 68
The output from the Python script matches the output of plain perf script output, showing both weight and ins_lat.
>
> But in general try to provide the steps to show that the functionality
> that you are adding is actually working, making it easy for other
> people to try reproducing your results.
Will do for future patches.
> Thanks,
>
> - Arnaldo
One thing I haven't figure out is that if I use perf I built from source, perf mem record doesn't seem to record the events for the Golden Cove P-cores.
$ ./perf version
perf version 6.11.0-rc2
$ ./perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7157 msec =====
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.251 MB perf.data ]
$ ./perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
I think the above recording issue is orthogonal to this patch, and possibly a result of running 6.11 perf userland on a 6.5 kernel.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-11 3:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09 8:01 [PATCH v4] perf script python: Add the ins_lat field to event handler Zixian Cai
2024-08-09 13:36 ` Arnaldo Carvalho de Melo
2024-08-11 3:52 ` Zixian Cai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).