* [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
@ 2024-07-10 4:59 kernel test robot
2024-07-10 13:15 ` Liang, Kan
2024-07-15 20:05 ` Liang, Kan
0 siblings, 2 replies; 9+ messages in thread
From: kernel test robot @ 2024-07-10 4:59 UTC (permalink / raw)
To: Ian Rogers
Cc: oe-lkp, lkp, Linux Memory Management List, Namhyung Kim,
Weilin Wang, Caleb Biggers, Kan Liang, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel, oliver.sang
Hello,
kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
in testcase: perf-sanity-tests
version:
with following parameters:
perf_compiler: gcc
compiler: gcc-13
test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
we also observed two cases which also failed on parent can pass on this commit.
FYI.
caccae3ce7b988b6 e2641db83f18782f57a0e107c50
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
:6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
:6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
105: perf all metricgroups test : Ok
2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
106: perf all metrics test : Ok
2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
107: perf all libpfm4 events test : Ok
2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
108: perf all PMU test : FAILED!
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240710/202407101021.2c8baddb-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-10 4:59 [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail kernel test robot
@ 2024-07-10 13:15 ` Liang, Kan
2024-07-11 8:04 ` Oliver Sang
2024-07-15 20:05 ` Liang, Kan
1 sibling, 1 reply; 9+ messages in thread
From: Liang, Kan @ 2024-07-10 13:15 UTC (permalink / raw)
To: kernel test robot, Ian Rogers
Cc: oe-lkp, lkp, Linux Memory Management List, Namhyung Kim,
Weilin Wang, Caleb Biggers, Alexandre Torgue, Maxime Coquelin,
linux-perf-users, linux-kernel
On 2024-07-10 12:59 a.m., kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
>
> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
>
> in testcase: perf-sanity-tests
> version:
> with following parameters:
>
> perf_compiler: gcc
>
>
>
> compiler: gcc-13
> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we also observed two cases which also failed on parent can pass on this commit.
> FYI.
>
>
> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
>
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
>
>
>
> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> 105: perf all metricgroups test : Ok
> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> 106: perf all metrics test : Ok
> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> 107: perf all libpfm4 events test : Ok
> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> 108: perf all PMU test : FAILED!
Can you please try the -vvv option, which should tell the failed event?
perf test -vvv "perf all PMU test"
Thanks,
Kan
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240710/202407101021.2c8baddb-oliver.sang@intel.com
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-10 13:15 ` Liang, Kan
@ 2024-07-11 8:04 ` Oliver Sang
2024-07-11 13:07 ` Liang, Kan
0 siblings, 1 reply; 9+ messages in thread
From: Oliver Sang @ 2024-07-11 8:04 UTC (permalink / raw)
To: Liang, Kan
Cc: Ian Rogers, oe-lkp, lkp, Linux Memory Management List,
Namhyung Kim, Weilin Wang, Caleb Biggers, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel, oliver.sang
hi, Kan,
On Wed, Jul 10, 2024 at 09:15:05AM -0400, Liang, Kan wrote:
>
>
> On 2024-07-10 12:59 a.m., kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
> >
> > commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
> >
> > in testcase: perf-sanity-tests
> > version:
> > with following parameters:
> >
> > perf_compiler: gcc
> >
> >
> >
> > compiler: gcc-13
> > test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > we also observed two cases which also failed on parent can pass on this commit.
> > FYI.
> >
> >
> > caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
> > :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
> > :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
> >
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
> >
> >
> >
> > 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> > 105: perf all metricgroups test : Ok
> > 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> > 106: perf all metrics test : Ok
> > 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> > 107: perf all libpfm4 events test : Ok
> > 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> > 108: perf all PMU test : FAILED!
>
> Can you please try the -vvv option, which should tell the failed event?
> perf test -vvv "perf all PMU test"
after add -vvv:
2024-07-11 07:45:20 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test -v 108
108: perf all PMU test:
--- start ---
test child forked, pid 18989
Testing (null)
---- end(-1) ----
108: perf all PMU test : FAILED!
>
> Thanks,
> Kan
> >
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240710/202407101021.2c8baddb-oliver.sang@intel.com
> >
> >
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-11 8:04 ` Oliver Sang
@ 2024-07-11 13:07 ` Liang, Kan
0 siblings, 0 replies; 9+ messages in thread
From: Liang, Kan @ 2024-07-11 13:07 UTC (permalink / raw)
To: Oliver Sang
Cc: Ian Rogers, oe-lkp, lkp, Linux Memory Management List,
Namhyung Kim, Weilin Wang, Caleb Biggers, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel
On 2024-07-11 4:04 a.m., Oliver Sang wrote:
> hi, Kan,
>
> On Wed, Jul 10, 2024 at 09:15:05AM -0400, Liang, Kan wrote:
>>
>>
>> On 2024-07-10 12:59 a.m., kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
>>>
>>> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
>>>
>>> in testcase: perf-sanity-tests
>>> version:
>>> with following parameters:
>>>
>>> perf_compiler: gcc
>>>
>>>
>>>
>>> compiler: gcc-13
>>> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
>>>
>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>
>>>
>>> we also observed two cases which also failed on parent can pass on this commit.
>>> FYI.
>>>
>>>
>>> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
>>> ---------------- ---------------------------
>>> fail:runs %reproduction fail:runs
>>> | | |
>>> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
>>> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
>>> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
>>>
>>>
>>>
>>>
>>>
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
>>>
>>>
>>>
>>> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
>>> 105: perf all metricgroups test : Ok
>>> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
>>> 106: perf all metrics test : Ok
>>> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
>>> 107: perf all libpfm4 events test : Ok
>>> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
>>> 108: perf all PMU test : FAILED!
>>
>> Can you please try the -vvv option, which should tell the failed event?
>> perf test -vvv "perf all PMU test"
>
> after add -vvv:
>
> 2024-07-11 07:45:20 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test -v 108
> 108: perf all PMU test:
> --- start ---
> test child forked, pid 18989
> Testing (null)
The test cases use the "perf list --raw-dump pmu" to get a complete
event list. It seems the perf list command got nothing. That's weird.
The main change of e2641db83f18 is to add the "Counter" field and a few
events. It should not impact the perf list command.
What's the output of "perf list --raw-dump pmu" on your machine?
Thanks,
Kan
> ---- end(-1) ----
> 108: perf all PMU test : FAILED!
>
>
>>
>> Thanks,
>> Kan
>>>
>>>
>>>
>>> The kernel config and materials to reproduce are available at:
>>> https://download.01.org/0day-ci/archive/20240710/202407101021.2c8baddb-oliver.sang@intel.com
>>>
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-10 4:59 [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail kernel test robot
2024-07-10 13:15 ` Liang, Kan
@ 2024-07-15 20:05 ` Liang, Kan
2024-07-15 20:11 ` Ian Rogers
1 sibling, 1 reply; 9+ messages in thread
From: Liang, Kan @ 2024-07-15 20:05 UTC (permalink / raw)
To: kernel test robot, Ian Rogers
Cc: oe-lkp, lkp, Linux Memory Management List, Namhyung Kim,
Weilin Wang, Caleb Biggers, Alexandre Torgue, Maxime Coquelin,
linux-perf-users, linux-kernel
Hi Ian,
On 2024-07-10 12:59 a.m., kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
>
> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
>
> in testcase: perf-sanity-tests
> version:
> with following parameters:
>
> perf_compiler: gcc
>
>
>
> compiler: gcc-13
> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we also observed two cases which also failed on parent can pass on this commit.
> FYI.
>
>
> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
>
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
>
>
>
> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> 105: perf all metricgroups test : Ok
> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> 106: perf all metrics test : Ok
> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> 107: perf all libpfm4 events test : Ok
> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> 108: perf all PMU test : FAILED!
>
The failure is caused by the below change in the e2641db83f18.
+ {
+ "BriefDescription": "This 48-bit fixed counter counts the UCLK
cycles",
+ "Counter": "FIXED",
+ "EventCode": "0xff",
+ "EventName": "UNC_CLOCK.SOCKET",
+ "PerPkg": "1",
+ "PublicDescription": "This 48-bit fixed counter counts the UCLK
cycles.",
+ "Unit": "cbox_0"
}
The other cbox events have the unit name "CBOX", while the fixed counter
has a unit name "cbox_0". So the events_table will maintain separate
entries for cbox and cbox_0.
The perf_pmus__print_pmu_events() calculate the total number of events,
allocate an aliases buffer, store all the events into the buffer, sort,
and print all the aliases one by one.
The problem is that the calculated total number of events doesn't match
the stored events on the SKL machine.
The perf_pmu__num_events() is used to calculate the number of events. It
invokes the pmu_events_table__num_events() to go through the entire
events_table to find all events. Because of the
pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
the events for cbox and cbox_0 are all counted.
When storing events into the aliases buffer, the
perf_pmu__for_each_event() only process the events for cbox.
Since a bigger buffer was allocated, the last entry are all 0.
When printing all the aliases, null will be outputed.
$ perf list pmu
List of pre-defined events (to be used in -e or -M):
(null) [Kernel PMU event]
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
I'm thinking of two ways to address it.
One is to only print all the stored events. The below patch can fix it.
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 3fcabfd8fca1..2b2f5117ff84 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
print_callbacks *print_cb, void *p
perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
perf_pmus__print_pmu_events__callback);
}
+ len = state.index;
qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
for (int j = 0; j < len; j++) {
/* Skip duplicates */
The only drawback is that perf list will not show the new cbox_0 event.
(But the event name still works. Users can still apply perf stat -e
unc_clock.socket.)
Since the cbox_0 event is only available on old machines (SKL and
earlier), people should already use the equivalent kernel event. It
doesn't sounds a big issue for me. I prefer this simple fix.
I think the other way would be to modify the perf_pmu__for_each_event()
to go through all the possible PMUs.
It seems complicated and may impact others ARCHs (e.g., S390). I haven't
tried it yet.
What do you think?
Do you see any other ways to address the issue?
Thanks,
Kan
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-15 20:05 ` Liang, Kan
@ 2024-07-15 20:11 ` Ian Rogers
2024-07-15 21:41 ` Liang, Kan
0 siblings, 1 reply; 9+ messages in thread
From: Ian Rogers @ 2024-07-15 20:11 UTC (permalink / raw)
To: Liang, Kan
Cc: kernel test robot, oe-lkp, lkp, Linux Memory Management List,
Namhyung Kim, Weilin Wang, Caleb Biggers, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel
On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
> Hi Ian,
>
> On 2024-07-10 12:59 a.m., kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
> >
> > commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
> >
> > in testcase: perf-sanity-tests
> > version:
> > with following parameters:
> >
> > perf_compiler: gcc
> >
> >
> >
> > compiler: gcc-13
> > test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > we also observed two cases which also failed on parent can pass on this commit.
> > FYI.
> >
> >
> > caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
> > :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
> > :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
> >
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
> >
> >
> >
> > 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> > 105: perf all metricgroups test : Ok
> > 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> > 106: perf all metrics test : Ok
> > 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> > 107: perf all libpfm4 events test : Ok
> > 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> > 108: perf all PMU test : FAILED!
> >
>
> The failure is caused by the below change in the e2641db83f18.
>
> + {
> + "BriefDescription": "This 48-bit fixed counter counts the UCLK
> cycles",
> + "Counter": "FIXED",
> + "EventCode": "0xff",
> + "EventName": "UNC_CLOCK.SOCKET",
> + "PerPkg": "1",
> + "PublicDescription": "This 48-bit fixed counter counts the UCLK
> cycles.",
> + "Unit": "cbox_0"
> }
>
> The other cbox events have the unit name "CBOX", while the fixed counter
> has a unit name "cbox_0". So the events_table will maintain separate
> entries for cbox and cbox_0.
>
> The perf_pmus__print_pmu_events() calculate the total number of events,
> allocate an aliases buffer, store all the events into the buffer, sort,
> and print all the aliases one by one.
>
> The problem is that the calculated total number of events doesn't match
> the stored events on the SKL machine.
>
> The perf_pmu__num_events() is used to calculate the number of events. It
> invokes the pmu_events_table__num_events() to go through the entire
> events_table to find all events. Because of the
> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
> the events for cbox and cbox_0 are all counted.
>
> When storing events into the aliases buffer, the
> perf_pmu__for_each_event() only process the events for cbox.
>
> Since a bigger buffer was allocated, the last entry are all 0.
> When printing all the aliases, null will be outputed.
>
> $ perf list pmu
>
> List of pre-defined events (to be used in -e or -M):
>
> (null) [Kernel PMU event]
> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
>
>
> I'm thinking of two ways to address it.
> One is to only print all the stored events. The below patch can fix it.
>
> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> index 3fcabfd8fca1..2b2f5117ff84 100644
> --- a/tools/perf/util/pmus.c
> +++ b/tools/perf/util/pmus.c
> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
> print_callbacks *print_cb, void *p
> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
> perf_pmus__print_pmu_events__callback);
> }
> + len = state.index;
> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
> for (int j = 0; j < len; j++) {
> /* Skip duplicates */
>
> The only drawback is that perf list will not show the new cbox_0 event.
> (But the event name still works. Users can still apply perf stat -e
> unc_clock.socket.)
>
> Since the cbox_0 event is only available on old machines (SKL and
> earlier), people should already use the equivalent kernel event. It
> doesn't sounds a big issue for me. I prefer this simple fix.
>
> I think the other way would be to modify the perf_pmu__for_each_event()
> to go through all the possible PMUs.
> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
> tried it yet.
>
> What do you think?
> Do you see any other ways to address the issue?
Ugh. It seems the sizing and then iterating approach is just prone to
keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
need for perf_pmu__num_events, which seems to be the source of the
problems.
Thanks,
Ian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-15 20:11 ` Ian Rogers
@ 2024-07-15 21:41 ` Liang, Kan
2024-07-15 21:48 ` Ian Rogers
0 siblings, 1 reply; 9+ messages in thread
From: Liang, Kan @ 2024-07-15 21:41 UTC (permalink / raw)
To: Ian Rogers
Cc: kernel test robot, oe-lkp, lkp, Linux Memory Management List,
Namhyung Kim, Weilin Wang, Caleb Biggers, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel
On 2024-07-15 4:11 p.m., Ian Rogers wrote:
> On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>> Hi Ian,
>>
>> On 2024-07-10 12:59 a.m., kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
>>>
>>> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
>>>
>>> in testcase: perf-sanity-tests
>>> version:
>>> with following parameters:
>>>
>>> perf_compiler: gcc
>>>
>>>
>>>
>>> compiler: gcc-13
>>> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
>>>
>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>
>>>
>>> we also observed two cases which also failed on parent can pass on this commit.
>>> FYI.
>>>
>>>
>>> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
>>> ---------------- ---------------------------
>>> fail:runs %reproduction fail:runs
>>> | | |
>>> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
>>> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
>>> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
>>>
>>>
>>>
>>>
>>>
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
>>>
>>>
>>>
>>> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
>>> 105: perf all metricgroups test : Ok
>>> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
>>> 106: perf all metrics test : Ok
>>> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
>>> 107: perf all libpfm4 events test : Ok
>>> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
>>> 108: perf all PMU test : FAILED!
>>>
>>
>> The failure is caused by the below change in the e2641db83f18.
>>
>> + {
>> + "BriefDescription": "This 48-bit fixed counter counts the UCLK
>> cycles",
>> + "Counter": "FIXED",
>> + "EventCode": "0xff",
>> + "EventName": "UNC_CLOCK.SOCKET",
>> + "PerPkg": "1",
>> + "PublicDescription": "This 48-bit fixed counter counts the UCLK
>> cycles.",
>> + "Unit": "cbox_0"
>> }
>>
>> The other cbox events have the unit name "CBOX", while the fixed counter
>> has a unit name "cbox_0". So the events_table will maintain separate
>> entries for cbox and cbox_0.
>>
>> The perf_pmus__print_pmu_events() calculate the total number of events,
>> allocate an aliases buffer, store all the events into the buffer, sort,
>> and print all the aliases one by one.
>>
>> The problem is that the calculated total number of events doesn't match
>> the stored events on the SKL machine.
>>
>> The perf_pmu__num_events() is used to calculate the number of events. It
>> invokes the pmu_events_table__num_events() to go through the entire
>> events_table to find all events. Because of the
>> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
>> the events for cbox and cbox_0 are all counted.
>>
>> When storing events into the aliases buffer, the
>> perf_pmu__for_each_event() only process the events for cbox.
>>
>> Since a bigger buffer was allocated, the last entry are all 0.
>> When printing all the aliases, null will be outputed.
>>
>> $ perf list pmu
>>
>> List of pre-defined events (to be used in -e or -M):
>>
>> (null) [Kernel PMU event]
>> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
>> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
>>
>>
>> I'm thinking of two ways to address it.
>> One is to only print all the stored events. The below patch can fix it.
>>
>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>> index 3fcabfd8fca1..2b2f5117ff84 100644
>> --- a/tools/perf/util/pmus.c
>> +++ b/tools/perf/util/pmus.c
>> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
>> print_callbacks *print_cb, void *p
>> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
>> perf_pmus__print_pmu_events__callback);
>> }
>> + len = state.index;
>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>> for (int j = 0; j < len; j++) {
>> /* Skip duplicates */
>>
>> The only drawback is that perf list will not show the new cbox_0 event.
>> (But the event name still works. Users can still apply perf stat -e
>> unc_clock.socket.)
>>
>> Since the cbox_0 event is only available on old machines (SKL and
>> earlier), people should already use the equivalent kernel event. It
>> doesn't sounds a big issue for me. I prefer this simple fix.
>>
>> I think the other way would be to modify the perf_pmu__for_each_event()
>> to go through all the possible PMUs.
>> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
>> tried it yet.
>>
>> What do you think?
>> Do you see any other ways to address the issue?
>
> Ugh. It seems the sizing and then iterating approach is just prone to
> keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
> need for perf_pmu__num_events, which seems to be the source of the
> problems.
>
I think a realloc-ed array should have the same drawback as the first
way, but bad performance.
Because the pmu_add_cpu_aliases() in the perf_pmu__for_each_event() only
add the events from the first matched PMU. If we don't fix it, the
UNC_CLOCK.SOCKET of cbox_0 will never be displayed.
Thanks,
Kan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-15 21:41 ` Liang, Kan
@ 2024-07-15 21:48 ` Ian Rogers
2024-07-16 12:52 ` Liang, Kan
0 siblings, 1 reply; 9+ messages in thread
From: Ian Rogers @ 2024-07-15 21:48 UTC (permalink / raw)
To: Liang, Kan
Cc: kernel test robot, oe-lkp, lkp, Linux Memory Management List,
Namhyung Kim, Weilin Wang, Caleb Biggers, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel
On Mon, Jul 15, 2024 at 2:41 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-07-15 4:11 p.m., Ian Rogers wrote:
> > On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >> Hi Ian,
> >>
> >> On 2024-07-10 12:59 a.m., kernel test robot wrote:
> >>>
> >>>
> >>> Hello,
> >>>
> >>> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
> >>>
> >>> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> >>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>
> >>> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
> >>>
> >>> in testcase: perf-sanity-tests
> >>> version:
> >>> with following parameters:
> >>>
> >>> perf_compiler: gcc
> >>>
> >>>
> >>>
> >>> compiler: gcc-13
> >>> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
> >>>
> >>> (please refer to attached dmesg/kmsg for entire log/backtrace)
> >>>
> >>>
> >>> we also observed two cases which also failed on parent can pass on this commit.
> >>> FYI.
> >>>
> >>>
> >>> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> >>> ---------------- ---------------------------
> >>> fail:runs %reproduction fail:runs
> >>> | | |
> >>> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
> >>> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
> >>> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >>> the same patch/commit), kindly add following tags
> >>> | Reported-by: kernel test robot <oliver.sang@intel.com>
> >>> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
> >>>
> >>>
> >>>
> >>> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> >>> 105: perf all metricgroups test : Ok
> >>> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> >>> 106: perf all metrics test : Ok
> >>> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> >>> 107: perf all libpfm4 events test : Ok
> >>> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> >>> 108: perf all PMU test : FAILED!
> >>>
> >>
> >> The failure is caused by the below change in the e2641db83f18.
> >>
> >> + {
> >> + "BriefDescription": "This 48-bit fixed counter counts the UCLK
> >> cycles",
> >> + "Counter": "FIXED",
> >> + "EventCode": "0xff",
> >> + "EventName": "UNC_CLOCK.SOCKET",
> >> + "PerPkg": "1",
> >> + "PublicDescription": "This 48-bit fixed counter counts the UCLK
> >> cycles.",
> >> + "Unit": "cbox_0"
> >> }
> >>
> >> The other cbox events have the unit name "CBOX", while the fixed counter
> >> has a unit name "cbox_0". So the events_table will maintain separate
> >> entries for cbox and cbox_0.
> >>
> >> The perf_pmus__print_pmu_events() calculate the total number of events,
> >> allocate an aliases buffer, store all the events into the buffer, sort,
> >> and print all the aliases one by one.
> >>
> >> The problem is that the calculated total number of events doesn't match
> >> the stored events on the SKL machine.
> >>
> >> The perf_pmu__num_events() is used to calculate the number of events. It
> >> invokes the pmu_events_table__num_events() to go through the entire
> >> events_table to find all events. Because of the
> >> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
> >> the events for cbox and cbox_0 are all counted.
> >>
> >> When storing events into the aliases buffer, the
> >> perf_pmu__for_each_event() only process the events for cbox.
> >>
> >> Since a bigger buffer was allocated, the last entry are all 0.
> >> When printing all the aliases, null will be outputed.
> >>
> >> $ perf list pmu
> >>
> >> List of pre-defined events (to be used in -e or -M):
> >>
> >> (null) [Kernel PMU event]
> >> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
> >> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
> >>
> >>
> >> I'm thinking of two ways to address it.
> >> One is to only print all the stored events. The below patch can fix it.
> >>
> >> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> >> index 3fcabfd8fca1..2b2f5117ff84 100644
> >> --- a/tools/perf/util/pmus.c
> >> +++ b/tools/perf/util/pmus.c
> >> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
> >> print_callbacks *print_cb, void *p
> >> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
> >> perf_pmus__print_pmu_events__callback);
> >> }
> >> + len = state.index;
> >> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
> >> for (int j = 0; j < len; j++) {
> >> /* Skip duplicates */
> >>
> >> The only drawback is that perf list will not show the new cbox_0 event.
> >> (But the event name still works. Users can still apply perf stat -e
> >> unc_clock.socket.)
> >>
> >> Since the cbox_0 event is only available on old machines (SKL and
> >> earlier), people should already use the equivalent kernel event. It
> >> doesn't sounds a big issue for me. I prefer this simple fix.
> >>
> >> I think the other way would be to modify the perf_pmu__for_each_event()
> >> to go through all the possible PMUs.
> >> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
> >> tried it yet.
> >>
> >> What do you think?
> >> Do you see any other ways to address the issue?
> >
> > Ugh. It seems the sizing and then iterating approach is just prone to
> > keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
> > need for perf_pmu__num_events, which seems to be the source of the
> > problems.
> >
>
> I think a realloc-ed array should have the same drawback as the first
> way, but bad performance.
> Because the pmu_add_cpu_aliases() in the perf_pmu__for_each_event() only
> add the events from the first matched PMU. If we don't fix it, the
> UNC_CLOCK.SOCKET of cbox_0 will never be displayed.
Ok, but I don't think we need to optimize `perf list` for speed. Fwiw,
I think this was the fix for the last bug in this code:
https://lore.kernel.org/r/20240511003601.2666907-1-irogers@google.com
Thanks,
Ian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
2024-07-15 21:48 ` Ian Rogers
@ 2024-07-16 12:52 ` Liang, Kan
0 siblings, 0 replies; 9+ messages in thread
From: Liang, Kan @ 2024-07-16 12:52 UTC (permalink / raw)
To: Ian Rogers
Cc: kernel test robot, oe-lkp, lkp, Linux Memory Management List,
Namhyung Kim, Weilin Wang, Caleb Biggers, Alexandre Torgue,
Maxime Coquelin, linux-perf-users, linux-kernel
On 2024-07-15 5:48 p.m., Ian Rogers wrote:
> On Mon, Jul 15, 2024 at 2:41 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2024-07-15 4:11 p.m., Ian Rogers wrote:
>>> On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>> Hi Ian,
>>>>
>>>> On 2024-07-10 12:59 a.m., kernel test robot wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
>>>>>
>>>>> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>
>>>>> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
>>>>>
>>>>> in testcase: perf-sanity-tests
>>>>> version:
>>>>> with following parameters:
>>>>>
>>>>> perf_compiler: gcc
>>>>>
>>>>>
>>>>>
>>>>> compiler: gcc-13
>>>>> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
>>>>>
>>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>>>
>>>>>
>>>>> we also observed two cases which also failed on parent can pass on this commit.
>>>>> FYI.
>>>>>
>>>>>
>>>>> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
>>>>> ---------------- ---------------------------
>>>>> fail:runs %reproduction fail:runs
>>>>> | | |
>>>>> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
>>>>> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
>>>>> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>>> the same patch/commit), kindly add following tags
>>>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>>>> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
>>>>>
>>>>>
>>>>>
>>>>> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
>>>>> 105: perf all metricgroups test : Ok
>>>>> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
>>>>> 106: perf all metrics test : Ok
>>>>> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
>>>>> 107: perf all libpfm4 events test : Ok
>>>>> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
>>>>> 108: perf all PMU test : FAILED!
>>>>>
>>>>
>>>> The failure is caused by the below change in the e2641db83f18.
>>>>
>>>> + {
>>>> + "BriefDescription": "This 48-bit fixed counter counts the UCLK
>>>> cycles",
>>>> + "Counter": "FIXED",
>>>> + "EventCode": "0xff",
>>>> + "EventName": "UNC_CLOCK.SOCKET",
>>>> + "PerPkg": "1",
>>>> + "PublicDescription": "This 48-bit fixed counter counts the UCLK
>>>> cycles.",
>>>> + "Unit": "cbox_0"
>>>> }
>>>>
>>>> The other cbox events have the unit name "CBOX", while the fixed counter
>>>> has a unit name "cbox_0". So the events_table will maintain separate
>>>> entries for cbox and cbox_0.
>>>>
>>>> The perf_pmus__print_pmu_events() calculate the total number of events,
>>>> allocate an aliases buffer, store all the events into the buffer, sort,
>>>> and print all the aliases one by one.
>>>>
>>>> The problem is that the calculated total number of events doesn't match
>>>> the stored events on the SKL machine.
>>>>
>>>> The perf_pmu__num_events() is used to calculate the number of events. It
>>>> invokes the pmu_events_table__num_events() to go through the entire
>>>> events_table to find all events. Because of the
>>>> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
>>>> the events for cbox and cbox_0 are all counted.
>>>>
>>>> When storing events into the aliases buffer, the
>>>> perf_pmu__for_each_event() only process the events for cbox.
>>>>
>>>> Since a bigger buffer was allocated, the last entry are all 0.
>>>> When printing all the aliases, null will be outputed.
>>>>
>>>> $ perf list pmu
>>>>
>>>> List of pre-defined events (to be used in -e or -M):
>>>>
>>>> (null) [Kernel PMU event]
>>>> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
>>>> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
>>>>
>>>>
>>>> I'm thinking of two ways to address it.
>>>> One is to only print all the stored events. The below patch can fix it.
>>>>
>>>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>>>> index 3fcabfd8fca1..2b2f5117ff84 100644
>>>> --- a/tools/perf/util/pmus.c
>>>> +++ b/tools/perf/util/pmus.c
>>>> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
>>>> print_callbacks *print_cb, void *p
>>>> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
>>>> perf_pmus__print_pmu_events__callback);
>>>> }
>>>> + len = state.index;
>>>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>>>> for (int j = 0; j < len; j++) {
>>>> /* Skip duplicates */
>>>>
>>>> The only drawback is that perf list will not show the new cbox_0 event.
>>>> (But the event name still works. Users can still apply perf stat -e
>>>> unc_clock.socket.)
>>>>
>>>> Since the cbox_0 event is only available on old machines (SKL and
>>>> earlier), people should already use the equivalent kernel event. It
>>>> doesn't sounds a big issue for me. I prefer this simple fix.
>>>>
>>>> I think the other way would be to modify the perf_pmu__for_each_event()
>>>> to go through all the possible PMUs.
>>>> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
>>>> tried it yet.
>>>>
>>>> What do you think?
>>>> Do you see any other ways to address the issue?
>>>
>>> Ugh. It seems the sizing and then iterating approach is just prone to
>>> keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
>>> need for perf_pmu__num_events, which seems to be the source of the
>>> problems.
>>>
>>
>> I think a realloc-ed array should have the same drawback as the first
>> way, but bad performance.
>> Because the pmu_add_cpu_aliases() in the perf_pmu__for_each_event() only
>> add the events from the first matched PMU. If we don't fix it, the
>> UNC_CLOCK.SOCKET of cbox_0 will never be displayed.
>
> Ok, but I don't think we need to optimize `perf list` for speed. Fwiw,
> I think this was the fix for the last bug in this code:
> https://lore.kernel.org/r/20240511003601.2666907-1-irogers@google.com
That seems a different issue.
The root cause of this issue should be that different methods are used
between calculating the total number of events and adding the events.
A realloc-ed array should just hide the problem. There will still be an
issue if anyone use the pair of perf_pmu__num_events() and
pmu_add_cpu_aliases() somewhere else later.
It looks like the difference is introduced from the commit e3edd6cf6399
("perf pmu-events: Reduce processed events by passing PMU").
The pmu_events_table__for_each_event() stops immediately once a pmu is
set. But for uncore, especially this case, the method is wrong and
mismatch what perf does in the perf_pmu__num_events().
The below patch can fix it.
diff --git a/tools/perf/pmu-events/jevents.py
b/tools/perf/pmu-events/jevents.py
index ac9b7ca41856..97a3b018f865 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -923,7 +923,7 @@ int pmu_events_table__for_each_event(const struct
pmu_events_table *table,
continue;
ret = pmu_events_table__for_each_event_pmu(table, table_pmu, fn, data);
- if (pmu || ret)
+ if (ret)
return ret;
}
return 0;
The event can be displayed by the perf list pmu with the patch.
$ perf list pmu | grep -A 1 clock.socket
unc_clock.socket
[This 48-bit fixed counter counts the UCLK cycles. Unit:
uncore_cbox_0
& perf test "perf all PMU test"
107: perf all PMU test : Ok
I will send the patch shortly.
Thanks,
Kan
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-16 12:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 4:59 [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail kernel test robot
2024-07-10 13:15 ` Liang, Kan
2024-07-11 8:04 ` Oliver Sang
2024-07-11 13:07 ` Liang, Kan
2024-07-15 20:05 ` Liang, Kan
2024-07-15 20:11 ` Ian Rogers
2024-07-15 21:41 ` Liang, Kan
2024-07-15 21:48 ` Ian Rogers
2024-07-16 12:52 ` Liang, Kan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).