* [linux-next] perf report runtime extremely long
@ 2025-10-29 8:03 Thomas Richter
2025-10-29 15:13 ` Ian Rogers
2025-10-29 15:56 ` Thomas Richter
0 siblings, 2 replies; 14+ messages in thread
From: Thomas Richter @ 2025-10-29 8:03 UTC (permalink / raw)
To: linux-perf-use.; +Cc: Jan Polensky, Alexander Gordeev
Hi all,
has something changed recently in perf report in linux-next repo?
I run perf report -i <file> -D | grep something
on s390 with 2 different kernels:
master 6.18.0rc3:
# uname -a
Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
# ll /tmp/perf2-eSso2q-2
-rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
# time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
real 0m22.262s
user 0m22.068s
sys 0m0.200s
#
This runtime is ok for the perf.data file size.
linux-next:
# uname -a
Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
#ll /tmp/perf2-eSso2q-2
-rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
# time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
real 2m53.601s
user 2m52.938s
sys 0m0.575s
#
Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
for the same input!
Has anybody similar observations?
Thanks a lot
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 8:03 [linux-next] perf report runtime extremely long Thomas Richter
@ 2025-10-29 15:13 ` Ian Rogers
2025-10-29 15:52 ` Thomas Richter
2025-10-29 15:56 ` Thomas Richter
1 sibling, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-10-29 15:13 UTC (permalink / raw)
To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> Hi all,
>
> has something changed recently in perf report in linux-next repo?
>
> I run perf report -i <file> -D | grep something
> on s390 with 2 different kernels:
>
> master 6.18.0rc3:
> # uname -a
> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> # ll /tmp/perf2-eSso2q-2
> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>
> real 0m22.262s
> user 0m22.068s
> sys 0m0.200s
> #
>
> This runtime is ok for the perf.data file size.
>
> linux-next:
> # uname -a
> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> #ll /tmp/perf2-eSso2q-2
> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>
> real 2m53.601s
> user 2m52.938s
> sys 0m0.575s
> #
>
> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> for the same input!
>
> Has anybody similar observations?
So there were changes relating to addr2line that could impact this:
https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/
There was also the switch to always using build IDs:
https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/
Any chance you could do a `perf record`/`perf report` to see where the
time is going?
Thanks!
Ian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 15:13 ` Ian Rogers
@ 2025-10-29 15:52 ` Thomas Richter
2025-10-29 16:25 ` Ian Rogers
0 siblings, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-29 15:52 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On 10/29/25 16:13, Ian Rogers wrote:
> On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> Hi all,
>>
>> has something changed recently in perf report in linux-next repo?
>>
>> I run perf report -i <file> -D | grep something
>> on s390 with 2 different kernels:
>>
>> master 6.18.0rc3:
>> # uname -a
>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>> # ll /tmp/perf2-eSso2q-2
>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>
>> real 0m22.262s
>> user 0m22.068s
>> sys 0m0.200s
>> #
>>
>> This runtime is ok for the perf.data file size.
>>
>> linux-next:
>> # uname -a
>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>> #ll /tmp/perf2-eSso2q-2
>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>
>> real 2m53.601s
>> user 2m52.938s
>> sys 0m0.575s
>> #
>>
>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>> for the same input!
>>
>> Has anybody similar observations?
>
> So there were changes relating to addr2line that could impact this:
> https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/
>
> There was also the switch to always using build IDs:
> https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/
>
> Any chance you could do a `perf record`/`perf report` to see where the
> time is going?
>
> Thanks!
> Ian
Ok here is a first try, can do more tests tomorrow morning.
The file perf1.data has this size:
root in 🌐 s83lp47 in ~
❯ ll perf1.data
-rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
root in 🌐 s83lp47 in ~
❯
time perf record -- perf report -i perf1.data -D > /dev/null
^C[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 2.742 MB perf.data (70481 samples) ]
Terminated
real 13m9.894s
user 13m6.583s
sys 0m2.449s
root in 🌐 s83lp47 in ~ took 13m9s
❯ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 70K of event 'cpum_cf/cycles/P'
# Event count (approx.): 88101250000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..................................................................................
#
26.67% perf libc.so.6 [.] __printf_buffer
12.43% perf libc.so.6 [.] __GI___strnlen
12.30% perf libc.so.6 [.] __strstr_vx
11.92% perf perf [.] perf_pmu__for_each_event
7.46% perf libc.so.6 [.] __printf_buffer_write
5.48% perf libc.so.6 [.] memcpy
5.13% perf libc.so.6 [.] __GI___strchrnul
3.25% perf libc.so.6 [.] __vfscanf_internal
2.61% perf libc.so.6 [.] strchr
2.39% perf libc.so.6 [.] __vsnprintf_internal
2.19% perf libc.so.6 [.] __printf_buffer_done
2.03% perf perf [.] perf_pmu__is_tool
1.14% perf libc.so.6 [.] __snprintf_chk@@GLIBC_2.4
1.05% perf perf [.] get_counter_name_callback
0.73% perf libc.so.6 [.] __GI_____strtoull_l_internal
0.60% perf libc.so.6 [.] memcmp
0.28% perf libc.so.6 [.] _IO_str_init_static_internal
0.19% perf libc.so.6 [.] strlen
Hope this helps
Kind regards
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 8:03 [linux-next] perf report runtime extremely long Thomas Richter
2025-10-29 15:13 ` Ian Rogers
@ 2025-10-29 15:56 ` Thomas Richter
2025-10-29 16:47 ` Ian Rogers
1 sibling, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-29 15:56 UTC (permalink / raw)
To: linux-perf-use.; +Cc: Jan Polensky, Alexander Gordeev
On 10/29/25 09:03, Thomas Richter wrote:
> Hi all,
>
> has something changed recently in perf report in linux-next repo?
>
> I run perf report -i <file> -D | grep something
> on s390 with 2 different kernels:
>
> master 6.18.0rc3:
> # uname -a
> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> # ll /tmp/perf2-eSso2q-2
> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>
> real 0m22.262s
> user 0m22.068s
> sys 0m0.200s
> #
>
> This runtime is ok for the perf.data file size.
>
> linux-next:
> # uname -a
> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> #ll /tmp/perf2-eSso2q-2
> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>
> real 2m53.601s
> user 2m52.938s
> sys 0m0.575s
> #
>
> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> for the same input!
>
> Has anybody similar observations?
>
> Thanks a lot
>
Ian,
sorry sent button pushed too early...
here is a second try without option -D.
This is much better:
root in 🌐 s83lp47 in ~
❯ ll perf1.data
-rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
root in 🌐 s83lp47 in ~
❯ time perf record -- perf report -i perf1.data > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
real 0m1.097s
user 0m0.082s
sys 0m0.167s
root in 🌐 s83lp47 in ~
❯ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 246 of event 'cpum_cf/cycles/P'
# Event count (approx.): 307500000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ..................................................
#
9.35% perf perf [.] reader__read_event
4.47% perf perf [.] perf_env__arch
2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro
2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x
2.44% perf ld64.so.1 [.] do_lookup_x
2.44% perf perf [.] down_read
2.03% perf perf [.] hists__findnew_entry
2.03% perf perf [.] perf_hpp__is_dynamic_entry
1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14
1.63% perf [kernel.kallsyms] [k] format_decode
1.63% perf libc.so.6 [.] cfree@GLIBC_2.2
1.63% perf perf [.] evsel__parse_sample
1.63% perf perf [.] map__put
1.63% perf perf [.] sort__comm_cmp
1.63% perf perf [.] thread__put
1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes
1.22% perf [kernel.kallsyms] [k] get_page_from_freelist
1.22% perf ld64.so.1 [.] _dl_name_match_p
1.22% perf perf [.] dump_printf
1.22% perf perf [.] evlist__event2evsel
1.22% perf perf [.] hashmap_find
1.22% perf perf [.] perf_session__deliver_event
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 15:52 ` Thomas Richter
@ 2025-10-29 16:25 ` Ian Rogers
0 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-10-29 16:25 UTC (permalink / raw)
To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On Wed, Oct 29, 2025 at 8:52 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> On 10/29/25 16:13, Ian Rogers wrote:
> > On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
> >>
> >> Hi all,
> >>
> >> has something changed recently in perf report in linux-next repo?
> >>
> >> I run perf report -i <file> -D | grep something
> >> on s390 with 2 different kernels:
> >>
> >> master 6.18.0rc3:
> >> # uname -a
> >> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> >> # ll /tmp/perf2-eSso2q-2
> >> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >>
> >> real 0m22.262s
> >> user 0m22.068s
> >> sys 0m0.200s
> >> #
> >>
> >> This runtime is ok for the perf.data file size.
> >>
> >> linux-next:
> >> # uname -a
> >> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> >> #ll /tmp/perf2-eSso2q-2
> >> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >>
> >> real 2m53.601s
> >> user 2m52.938s
> >> sys 0m0.575s
> >> #
> >>
> >> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> >> for the same input!
> >>
> >> Has anybody similar observations?
> >
> > So there were changes relating to addr2line that could impact this:
> > https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/
> >
> > There was also the switch to always using build IDs:
> > https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/
> >
> > Any chance you could do a `perf record`/`perf report` to see where the
> > time is going?
> >
> > Thanks!
> > Ian
>
> Ok here is a first try, can do more tests tomorrow morning.
> The file perf1.data has this size:
>
> root in 🌐 s83lp47 in ~
> ❯ ll perf1.data
> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>
> root in 🌐 s83lp47 in ~
> ❯
>
> time perf record -- perf report -i perf1.data -D > /dev/null
> ^C[ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 2.742 MB perf.data (70481 samples) ]
> Terminated
>
> real 13m9.894s
> user 13m6.583s
> sys 0m2.449s
>
> root in 🌐 s83lp47 in ~ took 13m9s
> ❯ perf report --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 70K of event 'cpum_cf/cycles/P'
> # Event count (approx.): 88101250000
> #
> # Overhead Command Shared Object Symbol
> # ........ ....... ................. ..................................................................................
> #
> 26.67% perf libc.so.6 [.] __printf_buffer
> 12.43% perf libc.so.6 [.] __GI___strnlen
> 12.30% perf libc.so.6 [.] __strstr_vx
> 11.92% perf perf [.] perf_pmu__for_each_event
> 7.46% perf libc.so.6 [.] __printf_buffer_write
> 5.48% perf libc.so.6 [.] memcpy
> 5.13% perf libc.so.6 [.] __GI___strchrnul
> 3.25% perf libc.so.6 [.] __vfscanf_internal
> 2.61% perf libc.so.6 [.] strchr
> 2.39% perf libc.so.6 [.] __vsnprintf_internal
> 2.19% perf libc.so.6 [.] __printf_buffer_done
> 2.03% perf perf [.] perf_pmu__is_tool
> 1.14% perf libc.so.6 [.] __snprintf_chk@@GLIBC_2.4
> 1.05% perf perf [.] get_counter_name_callback
> 0.73% perf libc.so.6 [.] __GI_____strtoull_l_internal
> 0.60% perf libc.so.6 [.] memcmp
> 0.28% perf libc.so.6 [.] _IO_str_init_static_internal
> 0.19% perf libc.so.6 [.] strlen
Looks like the time is in s390-sample-raw.c but I don't see recent
changes there:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/log/tools/perf/util/s390-sample-raw.c?h=perf-tools-next
I wonder for the scanf if the following would help:
```
diff --git a/tools/perf/util/s390-sample-raw.c
b/tools/perf/util/s390-sample-raw.c
index a2d6a1f93cce..21edd359a43a 100644
--- a/tools/perf/util/s390-sample-raw.c
+++ b/tools/perf/util/s390-sample-raw.c
@@ -140,8 +140,9 @@ struct get_counter_name_data {
static int get_counter_name_callback(void *vdata, struct pmu_event_info *info)
{
struct get_counter_name_data *data = vdata;
- int rc, event_nr;
+ int event_nr;
const char *event_str;
+ char *end_ptr;
if (info->str == NULL)
return 0;
@@ -150,8 +151,8 @@ static int get_counter_name_callback(void *vdata,
struct pmu_event_info *info)
if (!event_str)
return 0;
- rc = sscanf(event_str, "event=%x", &event_nr);
- if (rc == 1 && event_nr == data->wanted) {
+ event_nr = strtol(event_str + 6, &end_ptr, /*base=*/16);
+ if ((end_ptr != event_str + 6) && event_nr == data->wanted) {
data->result = strdup(info->name);
return 1; /* Terminate the search. */
}
```
ah, new email, let me follow up further there.
> Hope this helps
>
> Kind regards
> --
> Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
> --
> IBM Deutschland Research & Development GmbH
>
> Vorsitzender des Aufsichtsrats: Wolfgang Wendt
>
> Geschäftsführung: David Faller
>
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 15:56 ` Thomas Richter
@ 2025-10-29 16:47 ` Ian Rogers
2025-10-30 6:04 ` Namhyung Kim
2025-10-30 8:59 ` Thomas Richter
0 siblings, 2 replies; 14+ messages in thread
From: Ian Rogers @ 2025-10-29 16:47 UTC (permalink / raw)
To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> On 10/29/25 09:03, Thomas Richter wrote:
> > Hi all,
> >
> > has something changed recently in perf report in linux-next repo?
> >
> > I run perf report -i <file> -D | grep something
> > on s390 with 2 different kernels:
> >
> > master 6.18.0rc3:
> > # uname -a
> > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> > # ll /tmp/perf2-eSso2q-2
> > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >
> > real 0m22.262s
> > user 0m22.068s
> > sys 0m0.200s
> > #
> >
> > This runtime is ok for the perf.data file size.
> >
> > linux-next:
> > # uname -a
> > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> > #ll /tmp/perf2-eSso2q-2
> > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >
> > real 2m53.601s
> > user 2m52.938s
> > sys 0m0.575s
> > #
> >
> > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> > for the same input!
> >
> > Has anybody similar observations?
> >
> > Thanks a lot
> >
>
> Ian,
>
> sorry sent button pushed too early...
>
> here is a second try without option -D.
>
> This is much better:
>
> root in 🌐 s83lp47 in ~
> ❯ ll perf1.data
> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>
> root in 🌐 s83lp47 in ~
> ❯ time perf record -- perf report -i perf1.data > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>
> real 0m1.097s
> user 0m0.082s
> sys 0m0.167s
>
> root in 🌐 s83lp47 in ~
> ❯ perf report --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 246 of event 'cpum_cf/cycles/P'
> # Event count (approx.): 307500000
> #
> # Overhead Command Shared Object Symbol
> # ........ ....... ................... ..................................................
> #
> 9.35% perf perf [.] reader__read_event
> 4.47% perf perf [.] perf_env__arch
> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro
> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x
> 2.44% perf ld64.so.1 [.] do_lookup_x
> 2.44% perf perf [.] down_read
> 2.03% perf perf [.] hists__findnew_entry
> 2.03% perf perf [.] perf_hpp__is_dynamic_entry
> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14
> 1.63% perf [kernel.kallsyms] [k] format_decode
> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2
> 1.63% perf perf [.] evsel__parse_sample
> 1.63% perf perf [.] map__put
> 1.63% perf perf [.] sort__comm_cmp
> 1.63% perf perf [.] thread__put
> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes
> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist
> 1.22% perf ld64.so.1 [.] _dl_name_match_p
> 1.22% perf perf [.] dump_printf
> 1.22% perf perf [.] evlist__event2evsel
> 1.22% perf perf [.] hashmap_find
> 1.22% perf perf [.] perf_session__deliver_event
Hmm.. for perf_env__arch we could probably cache better:
```
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index f1626d2032cd..2591f71c7c42 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
const char *perf_env__arch(struct perf_env *env)
{
- char *arch_name;
+ if (env && env->arch && env->arch_name_normalized)
+ return env->arch;
if (!env || !env->arch) { /* Assume local operation */
static struct utsname uts = { .machine[0] = '\0', };
if (uts.machine[0] == '\0' && uname(&uts) < 0)
return NULL;
- arch_name = uts.machine;
- } else
- arch_name = env->arch;
- return normalize_arch(arch_name);
+ if (!env)
+ return normalize_arch(uts.machine);
+
+ env->arch = strdup(normalize_arch(uts.machine));
+ env->arch_name_normalized = true;
+ } else {
+ char *normalized = strdup(normalize_arch(env->arch));
+
+ free(env->arch);
+ env->arch = normalized;
+ env->arch_name_normalized = true;
+ }
+ return env->arch;
}
#if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 9977b85523a8..da20bc3810bd 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -61,6 +61,7 @@ struct perf_env {
char *os_release;
char *version;
char *arch;
+ bool arch_name_normalized;
int nr_cpus_online;
int nr_cpus_avail;
char *cpu_desc;
```
or just force the normalize before assignment.
For reader__read_event and the perf_env these changes may be relevant:
"perf sample: Remove arch notion of sample parsing"
https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
"perf env: Remove global perf_env"
https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
I wonder if for some reason the env is NULL in your cases, so we go
slow path a lot.
The histogram stuff, Namhyung may know more.
Thanks!
Ian
> --
> Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
> --
> IBM Deutschland Research & Development GmbH
>
> Vorsitzender des Aufsichtsrats: Wolfgang Wendt
>
> Geschäftsführung: David Faller
>
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 16:47 ` Ian Rogers
@ 2025-10-30 6:04 ` Namhyung Kim
2025-10-30 8:59 ` Thomas Richter
1 sibling, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2025-10-30 6:04 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Richter, linux-perf-use., Jan Polensky, Alexander Gordeev
Hello,
On Wed, Oct 29, 2025 at 09:47:00AM -0700, Ian Rogers wrote:
> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
> >
> > On 10/29/25 09:03, Thomas Richter wrote:
> > > Hi all,
> > >
> > > has something changed recently in perf report in linux-next repo?
> > >
> > > I run perf report -i <file> -D | grep something
> > > on s390 with 2 different kernels:
> > >
> > > master 6.18.0rc3:
> > > # uname -a
> > > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> > > # ll /tmp/perf2-eSso2q-2
> > > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> > >
> > > real 0m22.262s
> > > user 0m22.068s
> > > sys 0m0.200s
> > > #
> > >
> > > This runtime is ok for the perf.data file size.
> > >
> > > linux-next:
> > > # uname -a
> > > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> > > #ll /tmp/perf2-eSso2q-2
> > > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> > >
> > > real 2m53.601s
> > > user 2m52.938s
> > > sys 0m0.575s
> > > #
> > >
> > > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> > > for the same input!
> > >
> > > Has anybody similar observations?
> > >
> > > Thanks a lot
> > >
> >
> > Ian,
> >
> > sorry sent button pushed too early...
> >
> > here is a second try without option -D.
> >
> > This is much better:
> >
> > root in 🌐 s83lp47 in ~
> > ❯ ll perf1.data
> > -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
> >
> > root in 🌐 s83lp47 in ~
> > ❯ time perf record -- perf report -i perf1.data > /dev/null
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
> >
> > real 0m1.097s
> > user 0m0.082s
> > sys 0m0.167s
> >
> > root in 🌐 s83lp47 in ~
> > ❯ perf report --stdio
> > # To display the perf.data header info, please use --header/--header-only options.
> > #
> > #
> > # Total Lost Samples: 0
> > #
> > # Samples: 246 of event 'cpum_cf/cycles/P'
> > # Event count (approx.): 307500000
> > #
> > # Overhead Command Shared Object Symbol
> > # ........ ....... ................... ..................................................
> > #
> > 9.35% perf perf [.] reader__read_event
> > 4.47% perf perf [.] perf_env__arch
> > 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro
> > 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x
> > 2.44% perf ld64.so.1 [.] do_lookup_x
> > 2.44% perf perf [.] down_read
> > 2.03% perf perf [.] hists__findnew_entry
> > 2.03% perf perf [.] perf_hpp__is_dynamic_entry
> > 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14
> > 1.63% perf [kernel.kallsyms] [k] format_decode
> > 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2
> > 1.63% perf perf [.] evsel__parse_sample
> > 1.63% perf perf [.] map__put
> > 1.63% perf perf [.] sort__comm_cmp
> > 1.63% perf perf [.] thread__put
> > 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes
> > 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist
> > 1.22% perf ld64.so.1 [.] _dl_name_match_p
> > 1.22% perf perf [.] dump_printf
> > 1.22% perf perf [.] evlist__event2evsel
> > 1.22% perf perf [.] hashmap_find
> > 1.22% perf perf [.] perf_session__deliver_event
>
> Hmm.. for perf_env__arch we could probably cache better:
> ```
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index f1626d2032cd..2591f71c7c42 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
>
> const char *perf_env__arch(struct perf_env *env)
> {
> - char *arch_name;
> + if (env && env->arch && env->arch_name_normalized)
> + return env->arch;
>
> if (!env || !env->arch) { /* Assume local operation */
> static struct utsname uts = { .machine[0] = '\0', };
> if (uts.machine[0] == '\0' && uname(&uts) < 0)
> return NULL;
> - arch_name = uts.machine;
> - } else
> - arch_name = env->arch;
>
> - return normalize_arch(arch_name);
> + if (!env)
> + return normalize_arch(uts.machine);
> +
> + env->arch = strdup(normalize_arch(uts.machine));
> + env->arch_name_normalized = true;
> + } else {
> + char *normalized = strdup(normalize_arch(env->arch));
> +
> + free(env->arch);
> + env->arch = normalized;
> + env->arch_name_normalized = true;
> + }
> + return env->arch;
> }
>
> #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 9977b85523a8..da20bc3810bd 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -61,6 +61,7 @@ struct perf_env {
> char *os_release;
> char *version;
> char *arch;
> + bool arch_name_normalized;
> int nr_cpus_online;
> int nr_cpus_avail;
> char *cpu_desc;
> ```
> or just force the normalize before assignment.
>
> For reader__read_event and the perf_env these changes may be relevant:
> "perf sample: Remove arch notion of sample parsing"
> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
> "perf env: Remove global perf_env"
> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
>
> I wonder if for some reason the env is NULL in your cases, so we go
> slow path a lot.
>
> The histogram stuff, Namhyung may know more.
There only 2 entries for the hist code. The hists__findnew_entry() is
expected as it's the main routine to aggregate the samples. But the
perf_hpp__is_dynamic_entry() is strange. It should be very small and
fast.
But I guess the actual problem is when you use -D in perf report. And
it doesn't show hist related functions IIUC.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-29 16:47 ` Ian Rogers
2025-10-30 6:04 ` Namhyung Kim
@ 2025-10-30 8:59 ` Thomas Richter
2025-10-30 13:25 ` Thomas Richter
1 sibling, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-30 8:59 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On 10/29/25 17:47, Ian Rogers wrote:
> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> On 10/29/25 09:03, Thomas Richter wrote:
>>> Hi all,
>>>
>>> has something changed recently in perf report in linux-next repo?
>>>
>>> I run perf report -i <file> -D | grep something
>>> on s390 with 2 different kernels:
>>>
>>> master 6.18.0rc3:
>>> # uname -a
>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>>> # ll /tmp/perf2-eSso2q-2
>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>
>>> real 0m22.262s
>>> user 0m22.068s
>>> sys 0m0.200s
>>> #
>>>
>>> This runtime is ok for the perf.data file size.
>>>
>>> linux-next:
>>> # uname -a
>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>>> #ll /tmp/perf2-eSso2q-2
>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>
>>> real 2m53.601s
>>> user 2m52.938s
>>> sys 0m0.575s
>>> #
>>>
>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>>> for the same input!
>>>
>>> Has anybody similar observations?
>>>
>>> Thanks a lot
>>>
>>
>> Ian,
>>
>> sorry sent button pushed too early...
>>
>> here is a second try without option -D.
>>
>> This is much better:
>>
>> root in 🌐 s83lp47 in ~
>> ❯ ll perf1.data
>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>>
>> root in 🌐 s83lp47 in ~
>> ❯ time perf record -- perf report -i perf1.data > /dev/null
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>>
>> real 0m1.097s
>> user 0m0.082s
>> sys 0m0.167s
>>
>> root in 🌐 s83lp47 in ~
>> ❯ perf report --stdio
>> # To display the perf.data header info, please use --header/--header-only options.
>> #
>> #
>> # Total Lost Samples: 0
>> #
>> # Samples: 246 of event 'cpum_cf/cycles/P'
>> # Event count (approx.): 307500000
>> #
>> # Overhead Command Shared Object Symbol
>> # ........ ....... ................... ..................................................
>> #
>> 9.35% perf perf [.] reader__read_event
>> 4.47% perf perf [.] perf_env__arch
>> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro
>> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x
>> 2.44% perf ld64.so.1 [.] do_lookup_x
>> 2.44% perf perf [.] down_read
>> 2.03% perf perf [.] hists__findnew_entry
>> 2.03% perf perf [.] perf_hpp__is_dynamic_entry
>> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14
>> 1.63% perf [kernel.kallsyms] [k] format_decode
>> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2
>> 1.63% perf perf [.] evsel__parse_sample
>> 1.63% perf perf [.] map__put
>> 1.63% perf perf [.] sort__comm_cmp
>> 1.63% perf perf [.] thread__put
>> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes
>> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist
>> 1.22% perf ld64.so.1 [.] _dl_name_match_p
>> 1.22% perf perf [.] dump_printf
>> 1.22% perf perf [.] evlist__event2evsel
>> 1.22% perf perf [.] hashmap_find
>> 1.22% perf perf [.] perf_session__deliver_event
>
> Hmm.. for perf_env__arch we could probably cache better:
> ```
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index f1626d2032cd..2591f71c7c42 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
>
> const char *perf_env__arch(struct perf_env *env)
> {
> - char *arch_name;
> + if (env && env->arch && env->arch_name_normalized)
> + return env->arch;
>
> if (!env || !env->arch) { /* Assume local operation */
> static struct utsname uts = { .machine[0] = '\0', };
> if (uts.machine[0] == '\0' && uname(&uts) < 0)
> return NULL;
> - arch_name = uts.machine;
> - } else
> - arch_name = env->arch;
>
> - return normalize_arch(arch_name);
> + if (!env)
> + return normalize_arch(uts.machine);
> +
> + env->arch = strdup(normalize_arch(uts.machine));
> + env->arch_name_normalized = true;
> + } else {
> + char *normalized = strdup(normalize_arch(env->arch));
> +
> + free(env->arch);
> + env->arch = normalized;
> + env->arch_name_normalized = true;
> + }
> + return env->arch;
> }
>
> #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 9977b85523a8..da20bc3810bd 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -61,6 +61,7 @@ struct perf_env {
> char *os_release;
> char *version;
> char *arch;
> + bool arch_name_normalized;
> int nr_cpus_online;
> int nr_cpus_avail;
> char *cpu_desc;
> ```
> or just force the normalize before assignment.
>
> For reader__read_event and the perf_env these changes may be relevant:
> "perf sample: Remove arch notion of sample parsing"
> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
> "perf env: Remove global perf_env"
> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
>
> I wonder if for some reason the env is NULL in your cases, so we go
> slow path a lot.
>
> The histogram stuff, Namhyung may know more.
>
> Thanks!
> Ian
>
Thanks Ian,
I tested your patch, but there is no big difference:
root in 🌐 s83lp47 in ~
❯ ll ~/perf.data-test
-rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test
here is the output with your patch:
root in 🌐 s83lp47 in ~
❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null
real 46m26.350s
user 46m17.559s
sys 0m7.434s
Here is the output from the linux-next perf version:
root in 🌐 s83lp47 in ~ took 46m26s
❯ time perf report -D -i ~/perf.data-test > /dev/null
real 49m14.364s
user 49m4.872s
sys 0m8.037s
root in 🌐 s83lp47 in ~ took 49m14s
When I go back history and use:
# pwd
/root/mirror-linux-next/tools/perf
# git branch
* (HEAD detached at v6.10)
master
]# ll ~/perf.data-test
-rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test
# time ./perf report -D -i ~/perf.data-test > /dev/null
real 8m49.089s
user 8m47.898s
sys 0m0.848s
#
Thanks for the pointer. Since the issue is with s390 specific code
I will do a git bisect and we go from there.
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-30 8:59 ` Thomas Richter
@ 2025-10-30 13:25 ` Thomas Richter
2025-10-30 14:27 ` Thomas Richter
0 siblings, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-30 13:25 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On 10/30/25 09:59, Thomas Richter wrote:
> On 10/29/25 17:47, Ian Rogers wrote:
>> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>>
>>> On 10/29/25 09:03, Thomas Richter wrote:
>>>> Hi all,
>>>>
>>>> has something changed recently in perf report in linux-next repo?
>>>>
>>>> I run perf report -i <file> -D | grep something
>>>> on s390 with 2 different kernels:
>>>>
>>>> master 6.18.0rc3:
>>>> # uname -a
>>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>>>> # ll /tmp/perf2-eSso2q-2
>>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>
>>>> real 0m22.262s
>>>> user 0m22.068s
>>>> sys 0m0.200s
>>>> #
>>>>
>>>> This runtime is ok for the perf.data file size.
>>>>
>>>> linux-next:
>>>> # uname -a
>>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>>>> #ll /tmp/perf2-eSso2q-2
>>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>
>>>> real 2m53.601s
>>>> user 2m52.938s
>>>> sys 0m0.575s
>>>> #
>>>>
>>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>>>> for the same input!
>>>>
>>>> Has anybody similar observations?
>>>>
>>>> Thanks a lot
>>>>
>>>
>>> Ian,
>>>
>>> sorry sent button pushed too early...
>>>
>>> here is a second try without option -D.
>>>
>>> This is much better:
>>>
>>> root in 🌐 s83lp47 in ~
>>> ❯ ll perf1.data
>>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>>>
>>> root in 🌐 s83lp47 in ~
>>> ❯ time perf record -- perf report -i perf1.data > /dev/null
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>>>
>>> real 0m1.097s
>>> user 0m0.082s
>>> sys 0m0.167s
>>>
>>> root in 🌐 s83lp47 in ~
>>> ❯ perf report --stdio
>>> # To display the perf.data header info, please use --header/--header-only options.
>>> #
>>> #
>>> # Total Lost Samples: 0
>>> #
>>> # Samples: 246 of event 'cpum_cf/cycles/P'
>>> # Event count (approx.): 307500000
>>> #
>>> # Overhead Command Shared Object Symbol
>>> # ........ ....... ................... ..................................................
>>> #
>>> 9.35% perf perf [.] reader__read_event
>>> 4.47% perf perf [.] perf_env__arch
>>> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro
>>> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x
>>> 2.44% perf ld64.so.1 [.] do_lookup_x
>>> 2.44% perf perf [.] down_read
>>> 2.03% perf perf [.] hists__findnew_entry
>>> 2.03% perf perf [.] perf_hpp__is_dynamic_entry
>>> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14
>>> 1.63% perf [kernel.kallsyms] [k] format_decode
>>> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2
>>> 1.63% perf perf [.] evsel__parse_sample
>>> 1.63% perf perf [.] map__put
>>> 1.63% perf perf [.] sort__comm_cmp
>>> 1.63% perf perf [.] thread__put
>>> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes
>>> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist
>>> 1.22% perf ld64.so.1 [.] _dl_name_match_p
>>> 1.22% perf perf [.] dump_printf
>>> 1.22% perf perf [.] evlist__event2evsel
>>> 1.22% perf perf [.] hashmap_find
>>> 1.22% perf perf [.] perf_session__deliver_event
>>
>> Hmm.. for perf_env__arch we could probably cache better:
>> ```
>> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
>> index f1626d2032cd..2591f71c7c42 100644
>> --- a/tools/perf/util/env.c
>> +++ b/tools/perf/util/env.c
>> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
>>
>> const char *perf_env__arch(struct perf_env *env)
>> {
>> - char *arch_name;
>> + if (env && env->arch && env->arch_name_normalized)
>> + return env->arch;
>>
>> if (!env || !env->arch) { /* Assume local operation */
>> static struct utsname uts = { .machine[0] = '\0', };
>> if (uts.machine[0] == '\0' && uname(&uts) < 0)
>> return NULL;
>> - arch_name = uts.machine;
>> - } else
>> - arch_name = env->arch;
>>
>> - return normalize_arch(arch_name);
>> + if (!env)
>> + return normalize_arch(uts.machine);
>> +
>> + env->arch = strdup(normalize_arch(uts.machine));
>> + env->arch_name_normalized = true;
>> + } else {
>> + char *normalized = strdup(normalize_arch(env->arch));
>> +
>> + free(env->arch);
>> + env->arch = normalized;
>> + env->arch_name_normalized = true;
>> + }
>> + return env->arch;
>> }
>>
>> #if defined(HAVE_LIBTRACEEVENT)
>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>> index 9977b85523a8..da20bc3810bd 100644
>> --- a/tools/perf/util/env.h
>> +++ b/tools/perf/util/env.h
>> @@ -61,6 +61,7 @@ struct perf_env {
>> char *os_release;
>> char *version;
>> char *arch;
>> + bool arch_name_normalized;
>> int nr_cpus_online;
>> int nr_cpus_avail;
>> char *cpu_desc;
>> ```
>> or just force the normalize before assignment.
>>
>> For reader__read_event and the perf_env these changes may be relevant:
>> "perf sample: Remove arch notion of sample parsing"
>> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
>> "perf env: Remove global perf_env"
>> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
>>
>> I wonder if for some reason the env is NULL in your cases, so we go
>> slow path a lot.
>>
>> The histogram stuff, Namhyung may know more.
>>
>> Thanks!
>> Ian
>>
>
> Thanks Ian,
>
> I tested your patch, but there is no big difference:
> root in 🌐 s83lp47 in ~
> ❯ ll ~/perf.data-test
> -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test
>
> here is the output with your patch:
> root in 🌐 s83lp47 in ~
> ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null
>
> real 46m26.350s
> user 46m17.559s
> sys 0m7.434s
>
>
> Here is the output from the linux-next perf version:
>
> root in 🌐 s83lp47 in ~ took 46m26s
> ❯ time perf report -D -i ~/perf.data-test > /dev/null
>
> real 49m14.364s
> user 49m4.872s
> sys 0m8.037s
>
> root in 🌐 s83lp47 in ~ took 49m14s
>
> When I go back history and use:
>
> # pwd
> /root/mirror-linux-next/tools/perf
> # git branch
> * (HEAD detached at v6.10)
> master
> ]# ll ~/perf.data-test
> -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test
> # time ./perf report -D -i ~/perf.data-test > /dev/null
>
> real 8m49.089s
> user 8m47.898s
> sys 0m0.848s
> #
>
> Thanks for the pointer. Since the issue is with s390 specific code
> I will do a git bisect and we go from there.
>
This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
Author: Ian Rogers <irogers@google.com>
Date: Sun Oct 5 11:24:16 2025 -0700
perf jevents: Add legacy-hardware and legacy-cache json
The legacy-hardware.json is added containing hardware events similarly
to the software.json file. A difference is that for the software PMU
the name is known and matches sysfs. In the legacy-hardware.json no
Unit/PMU is specified for the events meaning default_core is used and
the events will appear for all core PMUs.
......
What puzzles me is that the above patch does not change anything in
util/s390-sample-raw.c which handles s390 specific sample interpretation.
Maybe that patch affects perf_pmus__find()?
Not sure how to proceed.
Thanks a lot
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-30 13:25 ` Thomas Richter
@ 2025-10-30 14:27 ` Thomas Richter
2025-10-30 15:57 ` Ian Rogers
0 siblings, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-30 14:27 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On 10/30/25 14:25, Thomas Richter wrote:
> On 10/30/25 09:59, Thomas Richter wrote:
>> On 10/29/25 17:47, Ian Rogers wrote:
>>> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>>>
>>>> On 10/29/25 09:03, Thomas Richter wrote:
>>>>> Hi all,
>>>>>
>>>>> has something changed recently in perf report in linux-next repo?
>>>>>
>>>>> I run perf report -i <file> -D | grep something
>>>>> on s390 with 2 different kernels:
>>>>>
>>>>> master 6.18.0rc3:
>>>>> # uname -a
>>>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>>>>> # ll /tmp/perf2-eSso2q-2
>>>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>>
>>>>> real 0m22.262s
>>>>> user 0m22.068s
>>>>> sys 0m0.200s
>>>>> #
>>>>>
>>>>> This runtime is ok for the perf.data file size.
>>>>>
>>>>> linux-next:
>>>>> # uname -a
>>>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>>>>> #ll /tmp/perf2-eSso2q-2
>>>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>>
>>>>> real 2m53.601s
>>>>> user 2m52.938s
>>>>> sys 0m0.575s
>>>>> #
>>>>>
>>>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>>>>> for the same input!
>>>>>
>>>>> Has anybody similar observations?
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>
>>>> Ian,
>>>>
>>>> sorry sent button pushed too early...
>>>>
>>>> here is a second try without option -D.
>>>>
>>>> This is much better:
>>>>
>>>> root in 🌐 s83lp47 in ~
>>>> ❯ ll perf1.data
>>>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>>>>
>>>> root in 🌐 s83lp47 in ~
>>>> ❯ time perf record -- perf report -i perf1.data > /dev/null
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>>>>
>>>> real 0m1.097s
>>>> user 0m0.082s
>>>> sys 0m0.167s
>>>>
>>>> root in 🌐 s83lp47 in ~
>>>> ❯ perf report --stdio
>>>> # To display the perf.data header info, please use --header/--header-only options.
>>>> #
>>>> #
>>>> # Total Lost Samples: 0
>>>> #
>>>> # Samples: 246 of event 'cpum_cf/cycles/P'
>>>> # Event count (approx.): 307500000
>>>> #
>>>> # Overhead Command Shared Object Symbol
>>>> # ........ ....... ................... ..................................................
>>>> #
>>>> 9.35% perf perf [.] reader__read_event
>>>> 4.47% perf perf [.] perf_env__arch
>>>> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro
>>>> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x
>>>> 2.44% perf ld64.so.1 [.] do_lookup_x
>>>> 2.44% perf perf [.] down_read
>>>> 2.03% perf perf [.] hists__findnew_entry
>>>> 2.03% perf perf [.] perf_hpp__is_dynamic_entry
>>>> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14
>>>> 1.63% perf [kernel.kallsyms] [k] format_decode
>>>> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2
>>>> 1.63% perf perf [.] evsel__parse_sample
>>>> 1.63% perf perf [.] map__put
>>>> 1.63% perf perf [.] sort__comm_cmp
>>>> 1.63% perf perf [.] thread__put
>>>> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes
>>>> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist
>>>> 1.22% perf ld64.so.1 [.] _dl_name_match_p
>>>> 1.22% perf perf [.] dump_printf
>>>> 1.22% perf perf [.] evlist__event2evsel
>>>> 1.22% perf perf [.] hashmap_find
>>>> 1.22% perf perf [.] perf_session__deliver_event
>>>
>>> Hmm.. for perf_env__arch we could probably cache better:
>>> ```
>>> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
>>> index f1626d2032cd..2591f71c7c42 100644
>>> --- a/tools/perf/util/env.c
>>> +++ b/tools/perf/util/env.c
>>> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
>>>
>>> const char *perf_env__arch(struct perf_env *env)
>>> {
>>> - char *arch_name;
>>> + if (env && env->arch && env->arch_name_normalized)
>>> + return env->arch;
>>>
>>> if (!env || !env->arch) { /* Assume local operation */
>>> static struct utsname uts = { .machine[0] = '\0', };
>>> if (uts.machine[0] == '\0' && uname(&uts) < 0)
>>> return NULL;
>>> - arch_name = uts.machine;
>>> - } else
>>> - arch_name = env->arch;
>>>
>>> - return normalize_arch(arch_name);
>>> + if (!env)
>>> + return normalize_arch(uts.machine);
>>> +
>>> + env->arch = strdup(normalize_arch(uts.machine));
>>> + env->arch_name_normalized = true;
>>> + } else {
>>> + char *normalized = strdup(normalize_arch(env->arch));
>>> +
>>> + free(env->arch);
>>> + env->arch = normalized;
>>> + env->arch_name_normalized = true;
>>> + }
>>> + return env->arch;
>>> }
>>>
>>> #if defined(HAVE_LIBTRACEEVENT)
>>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>>> index 9977b85523a8..da20bc3810bd 100644
>>> --- a/tools/perf/util/env.h
>>> +++ b/tools/perf/util/env.h
>>> @@ -61,6 +61,7 @@ struct perf_env {
>>> char *os_release;
>>> char *version;
>>> char *arch;
>>> + bool arch_name_normalized;
>>> int nr_cpus_online;
>>> int nr_cpus_avail;
>>> char *cpu_desc;
>>> ```
>>> or just force the normalize before assignment.
>>>
>>> For reader__read_event and the perf_env these changes may be relevant:
>>> "perf sample: Remove arch notion of sample parsing"
>>> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
>>> "perf env: Remove global perf_env"
>>> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
>>>
>>> I wonder if for some reason the env is NULL in your cases, so we go
>>> slow path a lot.
>>>
>>> The histogram stuff, Namhyung may know more.
>>>
>>> Thanks!
>>> Ian
>>>
>>
>> Thanks Ian,
>>
>> I tested your patch, but there is no big difference:
>> root in 🌐 s83lp47 in ~
>> ❯ ll ~/perf.data-test
>> -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test
>>
>> here is the output with your patch:
>> root in 🌐 s83lp47 in ~
>> ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null
>>
>> real 46m26.350s
>> user 46m17.559s
>> sys 0m7.434s
>>
>>
>> Here is the output from the linux-next perf version:
>>
>> root in 🌐 s83lp47 in ~ took 46m26s
>> ❯ time perf report -D -i ~/perf.data-test > /dev/null
>>
>> real 49m14.364s
>> user 49m4.872s
>> sys 0m8.037s
>>
>> root in 🌐 s83lp47 in ~ took 49m14s
>>
>> When I go back history and use:
>>
>> # pwd
>> /root/mirror-linux-next/tools/perf
>> # git branch
>> * (HEAD detached at v6.10)
>> master
>> ]# ll ~/perf.data-test
>> -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test
>> # time ./perf report -D -i ~/perf.data-test > /dev/null
>>
>> real 8m49.089s
>> user 8m47.898s
>> sys 0m0.848s
>> #
>>
>> Thanks for the pointer. Since the issue is with s390 specific code
>> I will do a git bisect and we go from there.
>>
>
> This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
>
> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
> Author: Ian Rogers <irogers@google.com>
> Date: Sun Oct 5 11:24:16 2025 -0700
>
> perf jevents: Add legacy-hardware and legacy-cache json
>
> The legacy-hardware.json is added containing hardware events similarly
> to the software.json file. A difference is that for the software PMU
> the name is known and matches sysfs. In the legacy-hardware.json no
> Unit/PMU is specified for the events meaning default_core is used and
> the events will appear for all core PMUs.
> ......
>
> What puzzles me is that the above patch does not change anything in
> util/s390-sample-raw.c which handles s390 specific sample interpretation.
>
> Maybe that patch affects perf_pmus__find()?
> Not sure how to proceed.
>
> Thanks a lot
Ian,
the issue is within function perf_pmu__for_each_event().
Without calling this function the output is as fast as before.
(Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
I have looked at this code, but I did not understand a thing....
Strange, perf_pmu__for_each_event() did not change in this commit,
but the behavior changed a lot.
What we need for s390 is some way of faster event name lookup,
the PMU is known, its "cpum_cf". Is there no easy way to
find the events on this PMU (given the event number).
Maybe a some sort of table, because this lookup is done
millions of times on a large file with samples including raw data
Ideas?
Thanks
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-30 14:27 ` Thomas Richter
@ 2025-10-30 15:57 ` Ian Rogers
2025-10-30 17:14 ` Ian Rogers
2025-10-31 7:06 ` Thomas Richter
0 siblings, 2 replies; 14+ messages in thread
From: Ian Rogers @ 2025-10-30 15:57 UTC (permalink / raw)
To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> On 10/30/25 14:25, Thomas Richter wrote:
> > On 10/30/25 09:59, Thomas Richter wrote:
> > This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
> >
> > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
> > Author: Ian Rogers <irogers@google.com>
> > Date: Sun Oct 5 11:24:16 2025 -0700
> >
> > perf jevents: Add legacy-hardware and legacy-cache json
> >
> > The legacy-hardware.json is added containing hardware events similarly
> > to the software.json file. A difference is that for the software PMU
> > the name is known and matches sysfs. In the legacy-hardware.json no
> > Unit/PMU is specified for the events meaning default_core is used and
> > the events will appear for all core PMUs.
> > ......
> >
> > What puzzles me is that the above patch does not change anything in
> > util/s390-sample-raw.c which handles s390 specific sample interpretation.
> >
> > Maybe that patch affects perf_pmus__find()?
> > Not sure how to proceed.
> >
> > Thanks a lot
>
> Ian,
>
> the issue is within function perf_pmu__for_each_event().
> Without calling this function the output is as fast as before.
> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
>
> I have looked at this code, but I did not understand a thing....
> Strange, perf_pmu__for_each_event() did not change in this commit,
> but the behavior changed a lot.
>
> What we need for s390 is some way of faster event name lookup,
> the PMU is known, its "cpum_cf". Is there no easy way to
> find the events on this PMU (given the event number).
>
> Maybe a some sort of table, because this lookup is done
> millions of times on a large file with samples including raw data
>
> Ideas?
Thanks Thomas!
The bisect and the behavior make sense now. Previously the legacy
events weren't part of perf_pmu__for_each_event and now with the
legacy-hardware.json they are. The events were parsed by matching
explicitly in the parse-events.l lex parser, but for lots of reasons
the json is better. Anyway, there is common functionality for
reversing a perf_event_attr.config to an event name in
perf_pmu__name_from_config:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
The normal case for a PMU is that we want to see if it has an event by
name, so we have a hashmap from name to the "alias" information (I
don't really like the name "alias", it is information about an event
we want to encode):
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
The use-case for perf_pmu__name_from_config is generally debug output,
so there isn't a fast path config to name lookup. Is config to name
lookup sufficient in s390-sample-raw.c? I notice that the code is
matching the "event=" part of the encoding, which is a sub-part of the
config (and may actually be a value in config2 or config3).
I wonder if it would be easier to add a cache of matched names in
s390-sample-raw.c? So a global/static hashmap in that C file with a
key of set and nr as per get_counter_name (it seems the PMU can be
implied to always only be the core PMU) and a value of a strdup-ed
counter name.
Let me know what you think and I can send you a patch if you like.
Thanks,
Ian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-30 15:57 ` Ian Rogers
@ 2025-10-30 17:14 ` Ian Rogers
2025-10-31 8:20 ` Thomas Richter
2025-10-31 7:06 ` Thomas Richter
1 sibling, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-10-30 17:14 UTC (permalink / raw)
To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On Thu, Oct 30, 2025 at 8:57 AM Ian Rogers <irogers@google.com> wrote:
>
> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
> >
> > On 10/30/25 14:25, Thomas Richter wrote:
> > > On 10/30/25 09:59, Thomas Richter wrote:
> > > This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
> > >
> > > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
> > > Author: Ian Rogers <irogers@google.com>
> > > Date: Sun Oct 5 11:24:16 2025 -0700
> > >
> > > perf jevents: Add legacy-hardware and legacy-cache json
> > >
> > > The legacy-hardware.json is added containing hardware events similarly
> > > to the software.json file. A difference is that for the software PMU
> > > the name is known and matches sysfs. In the legacy-hardware.json no
> > > Unit/PMU is specified for the events meaning default_core is used and
> > > the events will appear for all core PMUs.
> > > ......
> > >
> > > What puzzles me is that the above patch does not change anything in
> > > util/s390-sample-raw.c which handles s390 specific sample interpretation.
> > >
> > > Maybe that patch affects perf_pmus__find()?
> > > Not sure how to proceed.
> > >
> > > Thanks a lot
> >
> > Ian,
> >
> > the issue is within function perf_pmu__for_each_event().
> > Without calling this function the output is as fast as before.
> > (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
> >
> > I have looked at this code, but I did not understand a thing....
> > Strange, perf_pmu__for_each_event() did not change in this commit,
> > but the behavior changed a lot.
> >
> > What we need for s390 is some way of faster event name lookup,
> > the PMU is known, its "cpum_cf". Is there no easy way to
> > find the events on this PMU (given the event number).
> >
> > Maybe a some sort of table, because this lookup is done
> > millions of times on a large file with samples including raw data
> >
> > Ideas?
>
> Thanks Thomas!
>
> The bisect and the behavior make sense now. Previously the legacy
> events weren't part of perf_pmu__for_each_event and now with the
> legacy-hardware.json they are. The events were parsed by matching
> explicitly in the parse-events.l lex parser, but for lots of reasons
> the json is better. Anyway, there is common functionality for
> reversing a perf_event_attr.config to an event name in
> perf_pmu__name_from_config:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
>
> The normal case for a PMU is that we want to see if it has an event by
> name, so we have a hashmap from name to the "alias" information (I
> don't really like the name "alias", it is information about an event
> we want to encode):
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
>
> The use-case for perf_pmu__name_from_config is generally debug output,
> so there isn't a fast path config to name lookup. Is config to name
> lookup sufficient in s390-sample-raw.c? I notice that the code is
> matching the "event=" part of the encoding, which is a sub-part of the
> config (and may actually be a value in config2 or config3).
>
> I wonder if it would be easier to add a cache of matched names in
> s390-sample-raw.c? So a global/static hashmap in that C file with a
> key of set and nr as per get_counter_name (it seems the PMU can be
> implied to always only be the core PMU) and a value of a strdup-ed
> counter name.
>
> Let me know what you think and I can send you a patch if you like.
So:
https://lore.kernel.org/lkml/20251030171211.1109480-1-irogers@google.com/
should cover it. Let me know what you think.
Thanks,
Ian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-30 15:57 ` Ian Rogers
2025-10-30 17:14 ` Ian Rogers
@ 2025-10-31 7:06 ` Thomas Richter
1 sibling, 0 replies; 14+ messages in thread
From: Thomas Richter @ 2025-10-31 7:06 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On 10/30/25 16:57, Ian Rogers wrote:
> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> On 10/30/25 14:25, Thomas Richter wrote:
>>> On 10/30/25 09:59, Thomas Richter wrote:
>>> This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
>>>
>>> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
>>> Author: Ian Rogers <irogers@google.com>
>>> Date: Sun Oct 5 11:24:16 2025 -0700
>>>
>>> perf jevents: Add legacy-hardware and legacy-cache json
>>>
>>> The legacy-hardware.json is added containing hardware events similarly
>>> to the software.json file. A difference is that for the software PMU
>>> the name is known and matches sysfs. In the legacy-hardware.json no
>>> Unit/PMU is specified for the events meaning default_core is used and
>>> the events will appear for all core PMUs.
>>> ......
>>>
>>> What puzzles me is that the above patch does not change anything in
>>> util/s390-sample-raw.c which handles s390 specific sample interpretation.
>>>
>>> Maybe that patch affects perf_pmus__find()?
>>> Not sure how to proceed.
>>>
>>> Thanks a lot
>>
>> Ian,
>>
>> the issue is within function perf_pmu__for_each_event().
>> Without calling this function the output is as fast as before.
>> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
>>
>> I have looked at this code, but I did not understand a thing....
>> Strange, perf_pmu__for_each_event() did not change in this commit,
>> but the behavior changed a lot.
>>
>> What we need for s390 is some way of faster event name lookup,
>> the PMU is known, its "cpum_cf". Is there no easy way to
>> find the events on this PMU (given the event number).
>>
>> Maybe a some sort of table, because this lookup is done
>> millions of times on a large file with samples including raw data
>>
>> Ideas?
>
> Thanks Thomas!
>
> The bisect and the behavior make sense now. Previously the legacy
> events weren't part of perf_pmu__for_each_event and now with the
> legacy-hardware.json they are. The events were parsed by matching
> explicitly in the parse-events.l lex parser, but for lots of reasons
> the json is better. Anyway, there is common functionality for
> reversing a perf_event_attr.config to an event name in
> perf_pmu__name_from_config:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
>
> The normal case for a PMU is that we want to see if it has an event by
> name, so we have a hashmap from name to the "alias" information (I
> don't really like the name "alias", it is information about an event
> we want to encode):
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
>
> The use-case for perf_pmu__name_from_config is generally debug output,
> so there isn't a fast path config to name lookup. Is config to name
> lookup sufficient in s390-sample-raw.c? I notice that the code is
> matching the "event=" part of the encoding, which is a sub-part of the
> config (and may actually be a value in config2 or config3).
>
> I wonder if it would be easier to add a cache of matched names in
> s390-sample-raw.c? So a global/static hashmap in that C file with a
> key of set and nr as per get_counter_name (it seems the PMU can be
> implied to always only be the core PMU) and a value of a strdup-ed
> counter name.
I do confirm your assessment, here is the proof. A very poor man's caching
in util/s390-sample-raw.c:
# git diff
diff --git a/tools/perf/util/s390-sample-raw.c b/tools/perf/util/s390-sample-raw.c
index 335217bb532b..b04d3b1a8750 100644
--- a/tools/perf/util/s390-sample-raw.c
+++ b/tools/perf/util/s390-sample-raw.c
@@ -162,6 +162,7 @@ static int get_counter_name_callback(void *vdata, struct pmu_event_info *info)
* number and use this as key. If they match return the name of this counter.
* If no match is found a NULL pointer is returned.
*/
+static char *ctrname[500];
static char *get_counter_name(int set, int nr, struct perf_pmu *pmu)
{
struct get_counter_name_data data = {
@@ -171,9 +172,14 @@ static char *get_counter_name(int set, int nr, struct perf_pmu *pmu)
if (!pmu)
return NULL;
+if (ctrname[data.wanted]) return strdup(ctrname[data.wanted]);
perf_pmu__for_each_event(pmu, /*skip_duplicate_pmus=*/ true,
&data, get_counter_name_callback);
+if(data.result)
+ ctrname[data.wanted] = strdup(data.result);
+else
+ ctrname[data.wanted] = strdup("unknown");
return data.result;
}
# cd
# time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null
real 0m25.908s
user 0m25.548s
sys 0m0.182s
#
>
> Let me know what you think and I can send you a patch if you like.
>
> Thanks,
> Ian
Of course!!
A hash list is more flexible, but an array is fine too. There are very few holes
in the counter set numbers, They are listed as "unknown". That does not mean these
counter do not exist! Most of them have values, but their name was not made public
in the documentation SA23-2261-09.
And there are more counters from the PAI crypto (172) and PAI NNPA PMUs (36), not much
but in a different range. So I guess this qualifies for the hash solution.
Thanks a lot
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long
2025-10-30 17:14 ` Ian Rogers
@ 2025-10-31 8:20 ` Thomas Richter
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Richter @ 2025-10-31 8:20 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev
On 10/30/25 18:14, Ian Rogers wrote:
> On Thu, Oct 30, 2025 at 8:57 AM Ian Rogers <irogers@google.com> wrote:
>>
>> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>>
>>> On 10/30/25 14:25, Thomas Richter wrote:
>>>> On 10/30/25 09:59, Thomas Richter wrote:
>>>> This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
>>>>
>>>> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
>>>> Author: Ian Rogers <irogers@google.com>
>>>> Date: Sun Oct 5 11:24:16 2025 -0700
>>>>
>>>> perf jevents: Add legacy-hardware and legacy-cache json
>>>>
>>>> The legacy-hardware.json is added containing hardware events similarly
>>>> to the software.json file. A difference is that for the software PMU
>>>> the name is known and matches sysfs. In the legacy-hardware.json no
>>>> Unit/PMU is specified for the events meaning default_core is used and
>>>> the events will appear for all core PMUs.
>>>> ......
>>>>
>>>> What puzzles me is that the above patch does not change anything in
>>>> util/s390-sample-raw.c which handles s390 specific sample interpretation.
>>>>
>>>> Maybe that patch affects perf_pmus__find()?
>>>> Not sure how to proceed.
>>>>
>>>> Thanks a lot
>>>
>>> Ian,
>>>
>>> the issue is within function perf_pmu__for_each_event().
>>> Without calling this function the output is as fast as before.
>>> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
>>>
>>> I have looked at this code, but I did not understand a thing....
>>> Strange, perf_pmu__for_each_event() did not change in this commit,
>>> but the behavior changed a lot.
>>>
>>> What we need for s390 is some way of faster event name lookup,
>>> the PMU is known, its "cpum_cf". Is there no easy way to
>>> find the events on this PMU (given the event number).
>>>
>>> Maybe a some sort of table, because this lookup is done
>>> millions of times on a large file with samples including raw data
>>>
>>> Ideas?
>>
>> Thanks Thomas!
>>
>> The bisect and the behavior make sense now. Previously the legacy
>> events weren't part of perf_pmu__for_each_event and now with the
>> legacy-hardware.json they are. The events were parsed by matching
>> explicitly in the parse-events.l lex parser, but for lots of reasons
>> the json is better. Anyway, there is common functionality for
>> reversing a perf_event_attr.config to an event name in
>> perf_pmu__name_from_config:
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
>>
>> The normal case for a PMU is that we want to see if it has an event by
>> name, so we have a hashmap from name to the "alias" information (I
>> don't really like the name "alias", it is information about an event
>> we want to encode):
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
>>
>> The use-case for perf_pmu__name_from_config is generally debug output,
>> so there isn't a fast path config to name lookup. Is config to name
>> lookup sufficient in s390-sample-raw.c? I notice that the code is
>> matching the "event=" part of the encoding, which is a sub-part of the
>> config (and may actually be a value in config2 or config3).
>>
>> I wonder if it would be easier to add a cache of matched names in
>> s390-sample-raw.c? So a global/static hashmap in that C file with a
>> key of set and nr as per get_counter_name (it seems the PMU can be
>> implied to always only be the core PMU) and a value of a strdup-ed
>> counter name.
>>
>> Let me know what you think and I can send you a patch if you like.
>
> So:
> https://lore.kernel.org/lkml/20251030171211.1109480-1-irogers@google.com/
> should cover it. Let me know what you think.
>
> Thanks,
> Ian
Great work, I downloaded and tested your patch
# time perf report -D -i ~/perf.data-test > /dev/null
real 9m16.125s
user 9m14.838s
sys 0m0.912s
[root@b83lp65 perf]# ll ~/perf.data-test
-rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-390raw
#
This is back to what is was before. Much better than the ~50 minutes.
If there is need for further improvement, we can go for an array solution.
Please submit this patch to the mailing list.
Thank you very much for your help.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-10-31 8:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29 8:03 [linux-next] perf report runtime extremely long Thomas Richter
2025-10-29 15:13 ` Ian Rogers
2025-10-29 15:52 ` Thomas Richter
2025-10-29 16:25 ` Ian Rogers
2025-10-29 15:56 ` Thomas Richter
2025-10-29 16:47 ` Ian Rogers
2025-10-30 6:04 ` Namhyung Kim
2025-10-30 8:59 ` Thomas Richter
2025-10-30 13:25 ` Thomas Richter
2025-10-30 14:27 ` Thomas Richter
2025-10-30 15:57 ` Ian Rogers
2025-10-30 17:14 ` Ian Rogers
2025-10-31 8:20 ` Thomas Richter
2025-10-31 7:06 ` Thomas Richter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).