From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Ian Rogers <irogers@google.com>
Cc: kernel test robot <oliver.sang@intel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
Linux Memory Management List <linux-mm@kvack.org>,
Namhyung Kim <namhyung@kernel.org>,
Weilin Wang <weilin.wang@intel.com>,
Caleb Biggers <caleb.biggers@intel.com>,
Alexandre Torgue <alexandre.torgue@foss.st.com>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
Date: Mon, 15 Jul 2024 17:41:01 -0400 [thread overview]
Message-ID: <48b4bfb0-d0c7-4d1f-9e52-06e873646366@linux.intel.com> (raw)
In-Reply-To: <CAP-5=fUqGcnGvB71jHHTecLqcky6+TrFo+hWb=eBxZjxfe_m-g@mail.gmail.com>
On 2024-07-15 4:11 p.m., Ian Rogers wrote:
> On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>> Hi Ian,
>>
>> On 2024-07-10 12:59 a.m., kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
>>>
>>> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
>>>
>>> in testcase: perf-sanity-tests
>>> version:
>>> with following parameters:
>>>
>>> perf_compiler: gcc
>>>
>>>
>>>
>>> compiler: gcc-13
>>> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
>>>
>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>
>>>
>>> we also observed two cases which also failed on parent can pass on this commit.
>>> FYI.
>>>
>>>
>>> caccae3ce7b988b6 e2641db83f18782f57a0e107c50
>>> ---------------- ---------------------------
>>> fail:runs %reproduction fail:runs
>>> | | |
>>> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
>>> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
>>> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
>>>
>>>
>>>
>>>
>>>
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
>>>
>>>
>>>
>>> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
>>> 105: perf all metricgroups test : Ok
>>> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
>>> 106: perf all metrics test : Ok
>>> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
>>> 107: perf all libpfm4 events test : Ok
>>> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
>>> 108: perf all PMU test : FAILED!
>>>
>>
>> The failure is caused by the below change in the e2641db83f18.
>>
>> + {
>> + "BriefDescription": "This 48-bit fixed counter counts the UCLK
>> cycles",
>> + "Counter": "FIXED",
>> + "EventCode": "0xff",
>> + "EventName": "UNC_CLOCK.SOCKET",
>> + "PerPkg": "1",
>> + "PublicDescription": "This 48-bit fixed counter counts the UCLK
>> cycles.",
>> + "Unit": "cbox_0"
>> }
>>
>> The other cbox events have the unit name "CBOX", while the fixed counter
>> has a unit name "cbox_0". So the events_table will maintain separate
>> entries for cbox and cbox_0.
>>
>> The perf_pmus__print_pmu_events() calculate the total number of events,
>> allocate an aliases buffer, store all the events into the buffer, sort,
>> and print all the aliases one by one.
>>
>> The problem is that the calculated total number of events doesn't match
>> the stored events on the SKL machine.
>>
>> The perf_pmu__num_events() is used to calculate the number of events. It
>> invokes the pmu_events_table__num_events() to go through the entire
>> events_table to find all events. Because of the
>> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
>> the events for cbox and cbox_0 are all counted.
>>
>> When storing events into the aliases buffer, the
>> perf_pmu__for_each_event() only process the events for cbox.
>>
>> Since a bigger buffer was allocated, the last entry are all 0.
>> When printing all the aliases, null will be outputed.
>>
>> $ perf list pmu
>>
>> List of pre-defined events (to be used in -e or -M):
>>
>> (null) [Kernel PMU event]
>> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
>> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
>>
>>
>> I'm thinking of two ways to address it.
>> One is to only print all the stored events. The below patch can fix it.
>>
>> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
>> index 3fcabfd8fca1..2b2f5117ff84 100644
>> --- a/tools/perf/util/pmus.c
>> +++ b/tools/perf/util/pmus.c
>> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
>> print_callbacks *print_cb, void *p
>> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
>> perf_pmus__print_pmu_events__callback);
>> }
>> + len = state.index;
>> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>> for (int j = 0; j < len; j++) {
>> /* Skip duplicates */
>>
>> The only drawback is that perf list will not show the new cbox_0 event.
>> (But the event name still works. Users can still apply perf stat -e
>> unc_clock.socket.)
>>
>> Since the cbox_0 event is only available on old machines (SKL and
>> earlier), people should already use the equivalent kernel event. It
>> doesn't sounds a big issue for me. I prefer this simple fix.
>>
>> I think the other way would be to modify the perf_pmu__for_each_event()
>> to go through all the possible PMUs.
>> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
>> tried it yet.
>>
>> What do you think?
>> Do you see any other ways to address the issue?
>
> Ugh. It seems the sizing and then iterating approach is just prone to
> keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
> need for perf_pmu__num_events, which seems to be the source of the
> problems.
>
I think a realloc-ed array should have the same drawback as the first
way, but bad performance.
Because the pmu_add_cpu_aliases() in the perf_pmu__for_each_event() only
add the events from the first matched PMU. If we don't fix it, the
UNC_CLOCK.SOCKET of cbox_0 will never be displayed.
Thanks,
Kan
next prev parent reply other threads:[~2024-07-15 21:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-10 4:59 [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail kernel test robot
2024-07-10 13:15 ` Liang, Kan
2024-07-11 8:04 ` Oliver Sang
2024-07-11 13:07 ` Liang, Kan
2024-07-15 20:05 ` Liang, Kan
2024-07-15 20:11 ` Ian Rogers
2024-07-15 21:41 ` Liang, Kan [this message]
2024-07-15 21:48 ` Ian Rogers
2024-07-16 12:52 ` Liang, Kan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48b4bfb0-d0c7-4d1f-9e52-06e873646366@linux.intel.com \
--to=kan.liang@linux.intel.com \
--cc=alexandre.torgue@foss.st.com \
--cc=caleb.biggers@intel.com \
--cc=irogers@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mcoquelin.stm32@gmail.com \
--cc=namhyung@kernel.org \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=weilin.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).