linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] perf script python: Add the ins_lat field to event handler
@ 2024-08-09  8:01 Zixian Cai
  2024-08-09 13:36 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 3+ messages in thread
From: Zixian Cai @ 2024-08-09  8:01 UTC (permalink / raw)
  Cc: Zixian Cai, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Liang, Kan, Ben Gainey,
	Paran Lee, linux-perf-users, linux-kernel

For example, when using the Alder Lake PMU memory load event, the
instruction latency is stored in ins_lat, while the cache latency
is stored in weight.

This patch reports the ins_lat field for Python scripting.

Signed-off-by: Zixian Cai <fzczx123@gmail.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
v4) reformat commit message for max line length
v3) address review comments
v2) rebase on top of perf-tools-next

 tools/perf/util/scripting-engines/trace-event-python.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index fb00f3ad6815..6971dd6c231f 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -888,6 +888,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
 	set_sample_read_in_dict(dict_sample, sample, evsel);
 	pydict_set_item_string_decref(dict_sample, "weight",
 			PyLong_FromUnsignedLongLong(sample->weight));
+	pydict_set_item_string_decref(dict_sample, "ins_lat",
+			PyLong_FromUnsignedLong(sample->ins_lat));
 	pydict_set_item_string_decref(dict_sample, "transaction",
 			PyLong_FromUnsignedLongLong(sample->transaction));
 	set_sample_datasrc_in_dict(dict_sample, sample);
@@ -1317,7 +1319,7 @@ static void python_export_sample_table(struct db_export *dbe,
 	struct tables *tables = container_of(dbe, struct tables, dbe);
 	PyObject *t;

-	t = tuple_new(27);
+	t = tuple_new(28);

 	tuple_set_d64(t, 0, es->db_id);
 	tuple_set_d64(t, 1, es->evsel->db_id);
@@ -1346,6 +1348,7 @@ static void python_export_sample_table(struct db_export *dbe,
 	tuple_set_s32(t, 24, es->sample->flags);
 	tuple_set_d64(t, 25, es->sample->id);
 	tuple_set_d64(t, 26, es->sample->stream_id);
+	tuple_set_u32(t, 27, es->sample->ins_lat);

 	call_object(tables->sample_handler, t, "sample_table");

--
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] perf script python: Add the ins_lat field to event handler
  2024-08-09  8:01 [PATCH v4] perf script python: Add the ins_lat field to event handler Zixian Cai
@ 2024-08-09 13:36 ` Arnaldo Carvalho de Melo
  2024-08-11  3:52   ` Zixian Cai
  0 siblings, 1 reply; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-09 13:36 UTC (permalink / raw)
  To: Zixian Cai
  Cc: Adrian Hunter, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Liang, Kan, Ben Gainey, Paran Lee, linux-perf-users, linux-kernel

On Fri, Aug 09, 2024 at 08:01:36AM +0000, Zixian Cai wrote:
> For example, when using the Alder Lake PMU memory load event, the
> instruction latency is stored in ins_lat, while the cache latency
> is stored in weight.
> 
> This patch reports the ins_lat field for Python scripting.

So, how did you test this? I tried:

Committer testing:

On a Rocket Lake Refresh Intel machine (14th gen):

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name    : Intel(R) Core(TM) i7-14700K
  root@number:~# perf mem record -a sleep 5
  Memory events are enabled on a subset of CPUs: 16-27
  [ perf record: Woken up 85 times to write data ]
  [ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
  root@number:~# perf evlist -v
  cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
  cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
  dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  root@number:~#

Now generate a python script to then dump the dictionary that now needs
to have that 'ins_lat' field:

  root@number:~# perf script --gen python
  generated Python script: perf-script.py
  root@number:~# vim perf-script.py
  root@number:~# perf script -s perf-script.py | head -40
  in trace_begin
  in trace_end
  root@number:~# vim perf-script.py

But now the perf-script.py doesn't have a handler for the events and I
got just:

  root@number:~# perf script -s perf-script.py 
  in trace_begin
  in trace_end
  root@number:~# perf evlist 
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u
  root@number:~# perf report -D | grep PERF_RECORD_SAMPLE | wc -l
  5857
  root@number:~#

So now I'm investigating if this is some 'perf script' script generation
oddity by trying to run this on an AMD machine, non-hybrid...

But in general try to provide the steps to show that the functionality
that you are adding is actually working, making it easy for other
people to try reproducing your results.

Thanks,

- Arnaldo
 
> Signed-off-by: Zixian Cai <fzczx123@gmail.com>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> v4) reformat commit message for max line length
> v3) address review comments
> v2) rebase on top of perf-tools-next
> 
>  tools/perf/util/scripting-engines/trace-event-python.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index fb00f3ad6815..6971dd6c231f 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -888,6 +888,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
>  	set_sample_read_in_dict(dict_sample, sample, evsel);
>  	pydict_set_item_string_decref(dict_sample, "weight",
>  			PyLong_FromUnsignedLongLong(sample->weight));
> +	pydict_set_item_string_decref(dict_sample, "ins_lat",
> +			PyLong_FromUnsignedLong(sample->ins_lat));
>  	pydict_set_item_string_decref(dict_sample, "transaction",
>  			PyLong_FromUnsignedLongLong(sample->transaction));
>  	set_sample_datasrc_in_dict(dict_sample, sample);
> @@ -1317,7 +1319,7 @@ static void python_export_sample_table(struct db_export *dbe,
>  	struct tables *tables = container_of(dbe, struct tables, dbe);
>  	PyObject *t;
> 
> -	t = tuple_new(27);
> +	t = tuple_new(28);
> 
>  	tuple_set_d64(t, 0, es->db_id);
>  	tuple_set_d64(t, 1, es->evsel->db_id);
> @@ -1346,6 +1348,7 @@ static void python_export_sample_table(struct db_export *dbe,
>  	tuple_set_s32(t, 24, es->sample->flags);
>  	tuple_set_d64(t, 25, es->sample->id);
>  	tuple_set_d64(t, 26, es->sample->stream_id);
> +	tuple_set_u32(t, 27, es->sample->ins_lat);
> 
>  	call_object(tables->sample_handler, t, "sample_table");
> 
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] perf script python: Add the ins_lat field to event handler
  2024-08-09 13:36 ` Arnaldo Carvalho de Melo
@ 2024-08-11  3:52   ` Zixian Cai
  0 siblings, 0 replies; 3+ messages in thread
From: Zixian Cai @ 2024-08-11  3:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Liang, Kan, Ben Gainey, Paran Lee, linux-perf-users, linux-kernel

On 9/8/2024 23:36, Arnaldo Carvalho de Melo wrote:
> On Fri, Aug 09, 2024 at 08:01:36AM +0000, Zixian Cai wrote:
>> For example, when using the Alder Lake PMU memory load event, the
>> instruction latency is stored in ins_lat, while the cache latency
>> is stored in weight.
>>
>> This patch reports the ins_lat field for Python scripting.
> 
> So, how did you test this? I tried:

This is how I tested it.

My machine is running 6.5.0-41-generic from Ubuntu 22.04 LTS, and I use OS's perf to record.

$ grep -m1 'model name' /proc/cpuinfo
model name	: 12th Gen Intel(R) Core(TM) i9-12900KF

$ perf version
perf version 6.5.13

$ perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava
...
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7988 msec =====
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 3.530 MB perf.data (47646 samples) ]

$ ./perf evlist -v
cpu_core/mem-loads-aux/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x8203, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1
cpu_core/mem-loads,ldlat=30/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, freq: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-loads,ldlat=30/P: type: 10, size: 136, config: 0x5d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_core/mem-stores/P: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x2cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
cpu_atom/mem-stores/P: type: 10, size: 136, config: 0x6d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:HG: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1

$ ./perf script -g python

Add a new method to perf-script.py

def process_event(params):
    if "cpu_core/mem-loads,ldlat" in params["ev_name"]:
        print(params["sample"]["weight"], params["sample"]["ins_lat"])

$ ./perf script|grep ldlat=|head
         taskset  182628 247517.778385:          1  cpu_core/mem-loads,ldlat=30/: ffffb33a850078a0     40268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  Addr               5              33               0 ffffffff8cc2ba08 [unknown] ([unknown])
         taskset  182628 247517.778409:          1  cpu_core/mem-loads,ldlat=30/: ffffb33a85007860     10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A                5              85               0 ffffffff8ce23476 [unknown] ([unknown])
         taskset  182628 247517.778431:          3  cpu_core/mem-loads,ldlat=30/: ffffb33a85007b78     10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A                5             163               0 ffffffff8d2061d0 [unknown] ([unknown])
         taskset  182628 247517.778444:          7  cpu_core/mem-loads,ldlat=30/: ffff90cf25b26280     10668100842 |OP LOAD|LVL L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A               96             120               0 ffffffff8dab2627 [unknown] ([unknown])
         taskset  182628 247517.778484:         23  cpu_core/mem-loads,ldlat=30/: ffffb33a85007cf0     10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A                5             218               0 ffffffff8cd96124 [unknown] ([unknown])
         taskset  182628 247517.778561:         39  cpu_core/mem-loads,ldlat=30/: ffffe271848b6600     20268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  Data               5             111               0 ffffffff8cd948cc [unknown] ([unknown])
         taskset  182628 247517.778629:         50  cpu_core/mem-loads,ldlat=30/: ffffe27184b6d280     11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A              71              73               0 ffffffff8cd94792 [unknown] ([unknown])
         taskset  182628 247517.778725:         67  cpu_core/mem-loads,ldlat=30/: ffff90c061ed6b48     11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A             240             242               0 ffffffff8cf9785b [unknown] ([unknown])
            java  182628 247517.778886:         81  cpu_core/mem-loads,ldlat=30/: ffffe27184888430     4026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK  Addr                  5              68               0 ffffffff8ce13245 [unknown] ([unknown])
            java  182628 247517.779164:         87  cpu_core/mem-loads,ldlat=30/: ffffe271bf9bca40     1026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK  N/A                   5              90               0 ffffffff8cd96387 [unknown] ([unknown])

$ ./perf script -s perf-script.py|head
in trace_begin
5 33
5 85
5 163
96 120
5 218
5 111
71 73
240 242
5 68

The output from the Python script matches the output of plain perf script output, showing both weight and ins_lat.

> 
> But in general try to provide the steps to show that the functionality
> that you are adding is actually working, making it easy for other
> people to try reproducing your results.

Will do for future patches.

> Thanks,
> 
> - Arnaldo

One thing I haven't figure out is that if I use perf I built from source, perf mem record doesn't seem to record the events for the Golden Cove P-cores.

$ ./perf version
perf version 6.11.0-rc2

$ ./perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava

Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7157 msec =====
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.251 MB perf.data ]

$ ./perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1

I think the above recording issue is orthogonal to this patch, and possibly a result of running 6.11 perf userland on a 6.5 kernel.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-08-11  3:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09  8:01 [PATCH v4] perf script python: Add the ins_lat field to event handler Zixian Cai
2024-08-09 13:36 ` Arnaldo Carvalho de Melo
2024-08-11  3:52   ` Zixian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).