* Perf test failures for 10.2 PMU event map aliases
@ 2024-08-20 2:06 Jon Kohler
2024-08-20 5:41 ` Ian Rogers
0 siblings, 1 reply; 7+ messages in thread
From: Jon Kohler @ 2024-08-20 2:06 UTC (permalink / raw)
To: irogers@google.com, adrian.hunter@intel.com,
linux-perf-users@vger.kernel.org, LKML, Kan Liang,
alexander.shishkin@linux.intel.com
Reaching out to the perf community for feedback on the following
observed test failure. On 6.6.y, I see persistent failures with test
10.2 PMU event map aliases, complaining about testing aliases uncore
PMU mismatches. I've included two outputs below, one with a bit of
hacky print debugging.
Using Intel(R) Xeon(R) Gold 6154 CPU:
10.2: PMU event map aliases :
--- start ---
test child forked, pid 962901
Using CPUID GenuineIntel-6-55-4
testing core PMU cpu aliases: pass
JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
(1) vs found (4)
test child finished with -1
---- end ----
PMU events subtest 2: FAILED!
Using Intel(R) Xeon(R) Platinum 8352Y:
10.2: PMU event map aliases :
--- start ---
test child forked, pid 1765070
Using CPUID GenuineIntel-6-6A-6
testing core PMU cpu aliases: pass
testing aliases uncore PMU uncore_imc_free_running_0: mismatch
expected aliases (1) vs found (6)
test child finished with -1
---- end ----
PMU events subtest 2: FAILED!
Digging in more, looking at pmu_aliases_parse, I see that we'll discard
scale and unit files in pmu_alias_info_file, which leaves us with 3x
aliases in the uncore_imc_0 in the first case and 5x aliases in the
uncore_imc_free_running_0 second case.
# From 6154-based system:
ls -lhat /sys/devices/uncore_imc_0/events
total 0
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
-r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
-r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
-r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
# From the 8352Y-based system:
ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
total 0
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
-r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
-r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
-r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
-r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
-r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
-r--r--r--. 1 root root 4.0K Aug 19 21:33 read
-r--r--r--. 1 root root 4.0K Aug 19 21:33 write
drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
Looking at the structure of __test_uncore_pmu_event_aliases, however,
I'm not quite sure how this is supposed to work. I've annotated a walk
through below to highlight where things are going off the rails.
static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
{
...
/* Count how many aliases we generated */
alias_count = perf_pmu__num_events(pmu);
// alias_count == 4 in the 6154-based system
// alias_count == 6 in the 8352Y-based system
/* Count how many aliases we expect from the known table */
for (table = &test_pmu->aliases[0]; *table; table++)
to_match_count++;
// this is looking at aliases in struct perf_pmu_test_pmu
// table, which for uncore_imc_0 is a single entry for
// &uncore_imc_cache_hits.
//
// for the 8352Y case, likewise, we only have a single alias
// in the table for &uncore_imc_free_running_cache_miss.
//
// in both cases, to_match_count == 1
// Compare 4 vs 1 or 6 vs 1
if (alias_count != to_match_count) {
pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
return -1;
// we seemed doomed to hit this conditional always, no?
}
...
}
I did a walkthrough of the latest mainline code, and don't see a marked
difference that jump off the page to me that'd correct this behavior,
and would love a helping hand to point in the right direction on this.
What am I missing here?
Thanks all,
Jon
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases
2024-08-20 2:06 Perf test failures for 10.2 PMU event map aliases Jon Kohler
@ 2024-08-20 5:41 ` Ian Rogers
2024-08-20 13:54 ` Jon Kohler
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2024-08-20 5:41 UTC (permalink / raw)
To: Jon Kohler
Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML,
Kan Liang, alexander.shishkin@linux.intel.com
On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote:
>
> Reaching out to the perf community for feedback on the following
> observed test failure. On 6.6.y, I see persistent failures with test
> 10.2 PMU event map aliases, complaining about testing aliases uncore
> PMU mismatches. I've included two outputs below, one with a bit of
> hacky print debugging.
>
> Using Intel(R) Xeon(R) Gold 6154 CPU:
> 10.2: PMU event map aliases :
> --- start ---
> test child forked, pid 962901
> Using CPUID GenuineIntel-6-55-4
Hi Jon,
Sorry for the brief reply but I thought some quick hints might unblock
you on this. The CPUID lines up with a SkylakeX:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/mapfile.csv?h=perf-tools-next#n33
> testing core PMU cpu aliases: pass
> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
> (1) vs found (4)
> test child finished with -1
> ---- end ----
> PMU events subtest 2: FAILED!
>
> Using Intel(R) Xeon(R) Platinum 8352Y:
> 10.2: PMU event map aliases :
> --- start ---
> test child forked, pid 1765070
> Using CPUID GenuineIntel-6-6A-6
This is an IcelakeX:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/mapfile.csv?h=perf-tools-next#n18
> testing core PMU cpu aliases: pass
> testing aliases uncore PMU uncore_imc_free_running_0: mismatch
> expected aliases (1) vs found (6)
> test child finished with -1
> ---- end ----
> PMU events subtest 2: FAILED!
>
> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
> scale and unit files in pmu_alias_info_file, which leaves us with 3x
> aliases in the uncore_imc_0 in the first case and 5x aliases in the
> uncore_imc_free_running_0 second case.
>
> # From 6154-based system:
> ls -lhat /sys/devices/uncore_imc_0/events
The "uncore_" prefix and the "_0" suffix are optional, the naming
matching is case insensitive. In the event json the events are listed
here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json?h=perf-tools-next
> total 0
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
This should be 3 sysfs events (I don't like the term alias), note that
we load the sysfs and json events lazily to avoid overhead.
> drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
> drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
>
> # From the 8352Y-based system:
> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
> total 0
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
This is 5 sysfs events, the json events are here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json?h=perf-tools-next#n134
Note, the "Unit", meaning the PMU should be imc_free_running to match
this device.
> drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
> drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
>
> Looking at the structure of __test_uncore_pmu_event_aliases, however,
> I'm not quite sure how this is supposed to work. I've annotated a walk
> through below to highlight where things are going off the rails.
>
> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
> {
> ...
> /* Count how many aliases we generated */
> alias_count = perf_pmu__num_events(pmu);
> // alias_count == 4 in the 6154-based system
> // alias_count == 6 in the 8352Y-based system
>
> /* Count how many aliases we expect from the known table */
> for (table = &test_pmu->aliases[0]; *table; table++)
> to_match_count++;
> // this is looking at aliases in struct perf_pmu_test_pmu
> // table, which for uncore_imc_0 is a single entry for
> // &uncore_imc_cache_hits.
> //
> // for the 8352Y case, likewise, we only have a single alias
> // in the table for &uncore_imc_free_running_cache_miss.
> //
> // in both cases, to_match_count == 1
>
> // Compare 4 vs 1 or 6 vs 1
> if (alias_count != to_match_count) {
> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
> pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
> return -1;
> // we seemed doomed to hit this conditional always, no?
> }
> ...
> }
>
> I did a walkthrough of the latest mainline code, and don't see a marked
> difference that jump off the page to me that'd correct this behavior,
> and would love a helping hand to point in the right direction on this.
>
> What am I missing here?
I'll need some more time to dig into this. Hopefully the pointers above help.
Thanks,
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases
2024-08-20 5:41 ` Ian Rogers
@ 2024-08-20 13:54 ` Jon Kohler
2024-08-20 15:17 ` Ian Rogers
0 siblings, 1 reply; 7+ messages in thread
From: Jon Kohler @ 2024-08-20 13:54 UTC (permalink / raw)
To: Ian Rogers
Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML,
Kan Liang, alexander.shishkin@linux.intel.com
> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote:
>
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote:
>>
>> Reaching out to the perf community for feedback on the following
>> observed test failure. On 6.6.y, I see persistent failures with test
>> 10.2 PMU event map aliases, complaining about testing aliases uncore
>> PMU mismatches. I've included two outputs below, one with a bit of
>> hacky print debugging.
>>
>> Using Intel(R) Xeon(R) Gold 6154 CPU:
>> 10.2: PMU event map aliases :
>> --- start ---
>> test child forked, pid 962901
>> Using CPUID GenuineIntel-6-55-4
>
> Hi Jon,
>
> Sorry for the brief reply but I thought some quick hints might unblock
> you on this. The CPUID lines up with a SkylakeX:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
>
>> testing core PMU cpu aliases: pass
>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
>> (1) vs found (4)
>> test child finished with -1
>> ---- end ----
>> PMU events subtest 2: FAILED!
>>
>> Using Intel(R) Xeon(R) Platinum 8352Y:
>> 10.2: PMU event map aliases :
>> --- start ---
>> test child forked, pid 1765070
>> Using CPUID GenuineIntel-6-6A-6
>
> This is an IcelakeX:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
>
>> testing core PMU cpu aliases: pass
>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch
>> expected aliases (1) vs found (6)
>> test child finished with -1
>> ---- end ----
>> PMU events subtest 2: FAILED!
>>
>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
>> scale and unit files in pmu_alias_info_file, which leaves us with 3x
>> aliases in the uncore_imc_0 in the first case and 5x aliases in the
>> uncore_imc_free_running_0 second case.
>>
>> # From 6154-based system:
>> ls -lhat /sys/devices/uncore_imc_0/events
>
> The "uncore_" prefix and the "_0" suffix are optional, the naming
> matching is case insensitive. In the event json the events are listed
> here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
>
>> total 0
>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
>
> This should be 3 sysfs events (I don't like the term alias), note that
> we load the sysfs and json events lazily to avoid overhead.
>
>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
>>
>> # From the 8352Y-based system:
>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
>> total 0
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
>
> This is 5 sysfs events, the json events are here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
> Note, the "Unit", meaning the PMU should be imc_free_running to match
> this device.
>
>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
>>
>> Looking at the structure of __test_uncore_pmu_event_aliases, however,
>> I'm not quite sure how this is supposed to work. I've annotated a walk
>> through below to highlight where things are going off the rails.
>>
>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>> {
>> ...
>> /* Count how many aliases we generated */
>> alias_count = perf_pmu__num_events(pmu);
>> // alias_count == 4 in the 6154-based system
>> // alias_count == 6 in the 8352Y-based system
>>
>> /* Count how many aliases we expect from the known table */
>> for (table = &test_pmu->aliases[0]; *table; table++)
>> to_match_count++;
>> // this is looking at aliases in struct perf_pmu_test_pmu
>> // table, which for uncore_imc_0 is a single entry for
>> // &uncore_imc_cache_hits.
>> //
>> // for the 8352Y case, likewise, we only have a single alias
>> // in the table for &uncore_imc_free_running_cache_miss.
>> //
>> // in both cases, to_match_count == 1
>>
>> // Compare 4 vs 1 or 6 vs 1
>> if (alias_count != to_match_count) {
>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
>> return -1;
>> // we seemed doomed to hit this conditional always, no?
>> }
>> ...
>> }
>>
>> I did a walkthrough of the latest mainline code, and don't see a marked
>> difference that jump off the page to me that'd correct this behavior,
>> and would love a helping hand to point in the right direction on this.
>>
>> What am I missing here?
>
> I'll need some more time to dig into this. Hopefully the pointers above help.
Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
remains, as the code I posted to above seems to solely depend on the
info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
dots connect from this test to the other events in sysfs/json’s.
>
> Thanks,
> Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases
2024-08-20 13:54 ` Jon Kohler
@ 2024-08-20 15:17 ` Ian Rogers
2024-08-22 15:01 ` Jon Kohler
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2024-08-20 15:17 UTC (permalink / raw)
To: Jon Kohler
Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML,
Kan Liang, alexander.shishkin@linux.intel.com
On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote:
>
>
>
> > On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote:
> >
> > !-------------------------------------------------------------------|
> > CAUTION: External Email
> >
> > |-------------------------------------------------------------------!
> >
> > On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote:
> >>
> >> Reaching out to the perf community for feedback on the following
> >> observed test failure. On 6.6.y, I see persistent failures with test
> >> 10.2 PMU event map aliases, complaining about testing aliases uncore
> >> PMU mismatches. I've included two outputs below, one with a bit of
> >> hacky print debugging.
> >>
> >> Using Intel(R) Xeon(R) Gold 6154 CPU:
> >> 10.2: PMU event map aliases :
> >> --- start ---
> >> test child forked, pid 962901
> >> Using CPUID GenuineIntel-6-55-4
> >
> > Hi Jon,
> >
> > Sorry for the brief reply but I thought some quick hints might unblock
> > you on this. The CPUID lines up with a SkylakeX:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
> >
> >> testing core PMU cpu aliases: pass
> >> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
> >> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
> >> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
> >> (1) vs found (4)
> >> test child finished with -1
> >> ---- end ----
> >> PMU events subtest 2: FAILED!
> >>
> >> Using Intel(R) Xeon(R) Platinum 8352Y:
> >> 10.2: PMU event map aliases :
> >> --- start ---
> >> test child forked, pid 1765070
> >> Using CPUID GenuineIntel-6-6A-6
> >
> > This is an IcelakeX:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
> >
> >> testing core PMU cpu aliases: pass
> >> testing aliases uncore PMU uncore_imc_free_running_0: mismatch
> >> expected aliases (1) vs found (6)
> >> test child finished with -1
> >> ---- end ----
> >> PMU events subtest 2: FAILED!
> >>
> >> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
> >> scale and unit files in pmu_alias_info_file, which leaves us with 3x
> >> aliases in the uncore_imc_0 in the first case and 5x aliases in the
> >> uncore_imc_free_running_0 second case.
> >>
> >> # From 6154-based system:
> >> ls -lhat /sys/devices/uncore_imc_0/events
> >
> > The "uncore_" prefix and the "_0" suffix are optional, the naming
> > matching is case insensitive. In the event json the events are listed
> > here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
> >
> >> total 0
> >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
> >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
> >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
> >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
> >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
> >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
> >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
> >
> > This should be 3 sysfs events (I don't like the term alias), note that
> > we load the sysfs and json events lazily to avoid overhead.
> >
> >> drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
> >> drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
> >>
> >> # From the 8352Y-based system:
> >> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
> >> total 0
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
> >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
> >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
> >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
> >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
> >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
> >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
> >
> > This is 5 sysfs events, the json events are here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
> > Note, the "Unit", meaning the PMU should be imc_free_running to match
> > this device.
> >
> >> drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
> >> drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
> >>
> >> Looking at the structure of __test_uncore_pmu_event_aliases, however,
> >> I'm not quite sure how this is supposed to work. I've annotated a walk
> >> through below to highlight where things are going off the rails.
> >>
> >> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
> >> {
> >> ...
> >> /* Count how many aliases we generated */
> >> alias_count = perf_pmu__num_events(pmu);
> >> // alias_count == 4 in the 6154-based system
> >> // alias_count == 6 in the 8352Y-based system
> >>
> >> /* Count how many aliases we expect from the known table */
> >> for (table = &test_pmu->aliases[0]; *table; table++)
> >> to_match_count++;
> >> // this is looking at aliases in struct perf_pmu_test_pmu
> >> // table, which for uncore_imc_0 is a single entry for
> >> // &uncore_imc_cache_hits.
> >> //
> >> // for the 8352Y case, likewise, we only have a single alias
> >> // in the table for &uncore_imc_free_running_cache_miss.
> >> //
> >> // in both cases, to_match_count == 1
> >>
> >> // Compare 4 vs 1 or 6 vs 1
> >> if (alias_count != to_match_count) {
> >> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
> >> pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
> >> return -1;
> >> // we seemed doomed to hit this conditional always, no?
> >> }
> >> ...
> >> }
> >>
> >> I did a walkthrough of the latest mainline code, and don't see a marked
> >> difference that jump off the page to me that'd correct this behavior,
> >> and would love a helping hand to point in the right direction on this.
> >>
> >> What am I missing here?
> >
> > I'll need some more time to dig into this. Hopefully the pointers above help.
>
> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
> remains, as the code I posted to above seems to solely depend on the
> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
> dots connect from this test to the other events in sysfs/json’s.
So looking at the test it is using the testcpu:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/pmu-events.c?h=perf-tools-next#n602
the json for that is here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/test/test_soc/cpu?h=perf-tools-next
The names in the test are based on ones seen on real CPUs, so this may
be leading to the confusion.
Thanks,
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases
2024-08-20 15:17 ` Ian Rogers
@ 2024-08-22 15:01 ` Jon Kohler
2024-08-22 15:15 ` Ian Rogers
0 siblings, 1 reply; 7+ messages in thread
From: Jon Kohler @ 2024-08-22 15:01 UTC (permalink / raw)
To: Ian Rogers
Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML,
Kan Liang, alexander.shishkin@linux.intel.com
> On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@google.com> wrote:
>
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote:
>>
>>
>>
>>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote:
>>>
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>>
>>> |-------------------------------------------------------------------!
>>>
>>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote:
>>>>
>>>> Reaching out to the perf community for feedback on the following
>>>> observed test failure. On 6.6.y, I see persistent failures with test
>>>> 10.2 PMU event map aliases, complaining about testing aliases uncore
>>>> PMU mismatches. I've included two outputs below, one with a bit of
>>>> hacky print debugging.
>>>>
>>>> Using Intel(R) Xeon(R) Gold 6154 CPU:
>>>> 10.2: PMU event map aliases :
>>>> --- start ---
>>>> test child forked, pid 962901
>>>> Using CPUID GenuineIntel-6-55-4
>>>
>>> Hi Jon,
>>>
>>> Sorry for the brief reply but I thought some quick hints might unblock
>>> you on this. The CPUID lines up with a SkylakeX:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
>>>
>>>> testing core PMU cpu aliases: pass
>>>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
>>>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
>>>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
>>>> (1) vs found (4)
>>>> test child finished with -1
>>>> ---- end ----
>>>> PMU events subtest 2: FAILED!
>>>>
>>>> Using Intel(R) Xeon(R) Platinum 8352Y:
>>>> 10.2: PMU event map aliases :
>>>> --- start ---
>>>> test child forked, pid 1765070
>>>> Using CPUID GenuineIntel-6-6A-6
>>>
>>> This is an IcelakeX:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
>>>
>>>> testing core PMU cpu aliases: pass
>>>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch
>>>> expected aliases (1) vs found (6)
>>>> test child finished with -1
>>>> ---- end ----
>>>> PMU events subtest 2: FAILED!
>>>>
>>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
>>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x
>>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the
>>>> uncore_imc_free_running_0 second case.
>>>>
>>>> # From 6154-based system:
>>>> ls -lhat /sys/devices/uncore_imc_0/events
>>>
>>> The "uncore_" prefix and the "_0" suffix are optional, the naming
>>> matching is case insensitive. In the event json the events are listed
>>> here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
>>>
>>>> total 0
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
>>>
>>> This should be 3 sysfs events (I don't like the term alias), note that
>>> we load the sysfs and json events lazily to avoid overhead.
>>>
>>>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
>>>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
>>>>
>>>> # From the 8352Y-based system:
>>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
>>>> total 0
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
>>>
>>> This is 5 sysfs events, the json events are here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
>>> Note, the "Unit", meaning the PMU should be imc_free_running to match
>>> this device.
>>>
>>>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
>>>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
>>>>
>>>> Looking at the structure of __test_uncore_pmu_event_aliases, however,
>>>> I'm not quite sure how this is supposed to work. I've annotated a walk
>>>> through below to highlight where things are going off the rails.
>>>>
>>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>>> {
>>>> ...
>>>> /* Count how many aliases we generated */
>>>> alias_count = perf_pmu__num_events(pmu);
>>>> // alias_count == 4 in the 6154-based system
>>>> // alias_count == 6 in the 8352Y-based system
>>>>
>>>> /* Count how many aliases we expect from the known table */
>>>> for (table = &test_pmu->aliases[0]; *table; table++)
>>>> to_match_count++;
>>>> // this is looking at aliases in struct perf_pmu_test_pmu
>>>> // table, which for uncore_imc_0 is a single entry for
>>>> // &uncore_imc_cache_hits.
>>>> //
>>>> // for the 8352Y case, likewise, we only have a single alias
>>>> // in the table for &uncore_imc_free_running_cache_miss.
>>>> //
>>>> // in both cases, to_match_count == 1
>>>>
>>>> // Compare 4 vs 1 or 6 vs 1
>>>> if (alias_count != to_match_count) {
>>>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
>>>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
>>>> return -1;
>>>> // we seemed doomed to hit this conditional always, no?
>>>> }
>>>> ...
>>>> }
>>>>
>>>> I did a walkthrough of the latest mainline code, and don't see a marked
>>>> difference that jump off the page to me that'd correct this behavior,
>>>> and would love a helping hand to point in the right direction on this.
>>>>
>>>> What am I missing here?
>>>
>>> I'll need some more time to dig into this. Hopefully the pointers above help.
>>
>> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
>> remains, as the code I posted to above seems to solely depend on the
>> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
>> dots connect from this test to the other events in sysfs/json’s.
>
> So looking at the test it is using the testcpu:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e=
> the json for that is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e=
> The names in the test are based on ones seen on real CPUs, so this may
> be leading to the confusion.
Hey Ian,
I was able to debug this a bit more. The following diff fixes this test on my system.
Even though we were supposed to be using the test data only, the sysfs entries
from my systems, which happened to have similar names, threw a wrench in
this test.
With this diff, we just use the JSON aliases that were added.
Happy to send this out as a formal patch, but wanted to get the list’s 2cents
first, as I feel like I’m missing something :)
Jon
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index f5321fbdee79..893dc7afee76 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
const struct pmu_events_table *events_table;
int res = 0;
+ /* CPU events come from struct pmu_event pmu_events__test_soc_cpu
+ * and sys events come from struct pmu_event pmu_events__test_soc_sys
+ */
events_table = find_core_events_table("testarch", "testcpu");
if (!events_table)
return -1;
@@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
pmu->sysfs_aliases_loaded = true;
pmu_add_sys_aliases(pmu);
- /* Count how many aliases we generated */
- alias_count = perf_pmu__num_events(pmu);
+ /* How many events we gathered for this PMU in test_soc.
+ * Note: we specifically do not use perf_pmu__num_events as that may
+ * include spurious system events from the system under test, which
+ * may have similarly named PMUs.
+ */
+ alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases;
- /* Count how many aliases we expect from the known table */
+ /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */
for (table = &test_pmu->aliases[0]; *table; table++)
to_match_count++;
>
> Thanks,
> Ian
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases
2024-08-22 15:01 ` Jon Kohler
@ 2024-08-22 15:15 ` Ian Rogers
2024-08-22 15:37 ` Jon Kohler
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2024-08-22 15:15 UTC (permalink / raw)
To: Jon Kohler
Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML,
Kan Liang, alexander.shishkin@linux.intel.com
On Thu, Aug 22, 2024 at 8:01 AM Jon Kohler <jon@nutanix.com> wrote:
>
>
>
> > On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@google.com> wrote:
> >
> > !-------------------------------------------------------------------|
> > CAUTION: External Email
> >
> > |-------------------------------------------------------------------!
> >
> > On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote:
> >>
> >>
> >>
> >>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote:
> >>>
> >>> !-------------------------------------------------------------------|
> >>> CAUTION: External Email
> >>>
> >>> |-------------------------------------------------------------------!
> >>>
> >>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote:
> >>>>
> >>>> Reaching out to the perf community for feedback on the following
> >>>> observed test failure. On 6.6.y, I see persistent failures with test
> >>>> 10.2 PMU event map aliases, complaining about testing aliases uncore
> >>>> PMU mismatches. I've included two outputs below, one with a bit of
> >>>> hacky print debugging.
> >>>>
> >>>> Using Intel(R) Xeon(R) Gold 6154 CPU:
> >>>> 10.2: PMU event map aliases :
> >>>> --- start ---
> >>>> test child forked, pid 962901
> >>>> Using CPUID GenuineIntel-6-55-4
> >>>
> >>> Hi Jon,
> >>>
> >>> Sorry for the brief reply but I thought some quick hints might unblock
> >>> you on this. The CPUID lines up with a SkylakeX:
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
> >>>
> >>>> testing core PMU cpu aliases: pass
> >>>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
> >>>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
> >>>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
> >>>> (1) vs found (4)
> >>>> test child finished with -1
> >>>> ---- end ----
> >>>> PMU events subtest 2: FAILED!
> >>>>
> >>>> Using Intel(R) Xeon(R) Platinum 8352Y:
> >>>> 10.2: PMU event map aliases :
> >>>> --- start ---
> >>>> test child forked, pid 1765070
> >>>> Using CPUID GenuineIntel-6-6A-6
> >>>
> >>> This is an IcelakeX:
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
> >>>
> >>>> testing core PMU cpu aliases: pass
> >>>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch
> >>>> expected aliases (1) vs found (6)
> >>>> test child finished with -1
> >>>> ---- end ----
> >>>> PMU events subtest 2: FAILED!
> >>>>
> >>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
> >>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x
> >>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the
> >>>> uncore_imc_free_running_0 second case.
> >>>>
> >>>> # From 6154-based system:
> >>>> ls -lhat /sys/devices/uncore_imc_0/events
> >>>
> >>> The "uncore_" prefix and the "_0" suffix are optional, the naming
> >>> matching is case insensitive. In the event json the events are listed
> >>> here:
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
> >>>
> >>>> total 0
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
> >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
> >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
> >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
> >>>
> >>> This should be 3 sysfs events (I don't like the term alias), note that
> >>> we load the sysfs and json events lazily to avoid overhead.
> >>>
> >>>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
> >>>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
> >>>>
> >>>> # From the 8352Y-based system:
> >>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
> >>>> total 0
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
> >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
> >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
> >>>
> >>> This is 5 sysfs events, the json events are here:
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
> >>> Note, the "Unit", meaning the PMU should be imc_free_running to match
> >>> this device.
> >>>
> >>>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
> >>>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
> >>>>
> >>>> Looking at the structure of __test_uncore_pmu_event_aliases, however,
> >>>> I'm not quite sure how this is supposed to work. I've annotated a walk
> >>>> through below to highlight where things are going off the rails.
> >>>>
> >>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
> >>>> {
> >>>> ...
> >>>> /* Count how many aliases we generated */
> >>>> alias_count = perf_pmu__num_events(pmu);
> >>>> // alias_count == 4 in the 6154-based system
> >>>> // alias_count == 6 in the 8352Y-based system
> >>>>
> >>>> /* Count how many aliases we expect from the known table */
> >>>> for (table = &test_pmu->aliases[0]; *table; table++)
> >>>> to_match_count++;
> >>>> // this is looking at aliases in struct perf_pmu_test_pmu
> >>>> // table, which for uncore_imc_0 is a single entry for
> >>>> // &uncore_imc_cache_hits.
> >>>> //
> >>>> // for the 8352Y case, likewise, we only have a single alias
> >>>> // in the table for &uncore_imc_free_running_cache_miss.
> >>>> //
> >>>> // in both cases, to_match_count == 1
> >>>>
> >>>> // Compare 4 vs 1 or 6 vs 1
> >>>> if (alias_count != to_match_count) {
> >>>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
> >>>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
> >>>> return -1;
> >>>> // we seemed doomed to hit this conditional always, no?
> >>>> }
> >>>> ...
> >>>> }
> >>>>
> >>>> I did a walkthrough of the latest mainline code, and don't see a marked
> >>>> difference that jump off the page to me that'd correct this behavior,
> >>>> and would love a helping hand to point in the right direction on this.
> >>>>
> >>>> What am I missing here?
> >>>
> >>> I'll need some more time to dig into this. Hopefully the pointers above help.
> >>
> >> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
> >> remains, as the code I posted to above seems to solely depend on the
> >> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
> >> dots connect from this test to the other events in sysfs/json’s.
> >
> > So looking at the test it is using the testcpu:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e=
> > the json for that is here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e=
> > The names in the test are based on ones seen on real CPUs, so this may
> > be leading to the confusion.
>
> Hey Ian,
> I was able to debug this a bit more. The following diff fixes this test on my system.
>
> Even though we were supposed to be using the test data only, the sysfs entries
> from my systems, which happened to have similar names, threw a wrench in
> this test.
>
> With this diff, we just use the JSON aliases that were added.
>
> Happy to send this out as a formal patch, but wanted to get the list’s 2cents
> first, as I feel like I’m missing something :)
>
> Jon
>
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index f5321fbdee79..893dc7afee76 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
> const struct pmu_events_table *events_table;
> int res = 0;
>
> + /* CPU events come from struct pmu_event pmu_events__test_soc_cpu
> + * and sys events come from struct pmu_event pmu_events__test_soc_sys
> + */
> events_table = find_core_events_table("testarch", "testcpu");
> if (!events_table)
> return -1;
> @@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
> pmu->sysfs_aliases_loaded = true;
> pmu_add_sys_aliases(pmu);
>
> - /* Count how many aliases we generated */
> - alias_count = perf_pmu__num_events(pmu);
> + /* How many events we gathered for this PMU in test_soc.
> + * Note: we specifically do not use perf_pmu__num_events as that may
> + * include spurious system events from the system under test, which
> + * may have similarly named PMUs.
Thanks Jon, should we just rename the PMUs in the test json files? For
example, rather than "CBO" here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/test/test_soc/cpu/uncore.json?h=perf-tools-next#n10
we can have "test_pmu1".
Thanks for investigating this!
Ian
> + */
> + alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases;
>
> - /* Count how many aliases we expect from the known table */
> + /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */
> for (table = &test_pmu->aliases[0]; *table; table++)
> to_match_count++;
>
> >
> > Thanks,
> > Ian
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases
2024-08-22 15:15 ` Ian Rogers
@ 2024-08-22 15:37 ` Jon Kohler
0 siblings, 0 replies; 7+ messages in thread
From: Jon Kohler @ 2024-08-22 15:37 UTC (permalink / raw)
To: Ian Rogers
Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML,
Kan Liang, alexander.shishkin@linux.intel.com
> On Aug 22, 2024, at 11:15 AM, Ian Rogers <irogers@google.com> wrote:
>
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On Thu, Aug 22, 2024 at 8:01 AM Jon Kohler <jon@nutanix.com> wrote:
>>
>>
>>
>>> On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@google.com> wrote:
>>>
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>>
>>> |-------------------------------------------------------------------!
>>>
>>> On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote:
>>>>>
>>>>> !-------------------------------------------------------------------|
>>>>> CAUTION: External Email
>>>>>
>>>>> |-------------------------------------------------------------------!
>>>>>
>>>>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote:
>>>>>>
>>>>>> Reaching out to the perf community for feedback on the following
>>>>>> observed test failure. On 6.6.y, I see persistent failures with test
>>>>>> 10.2 PMU event map aliases, complaining about testing aliases uncore
>>>>>> PMU mismatches. I've included two outputs below, one with a bit of
>>>>>> hacky print debugging.
>>>>>>
>>>>>> Using Intel(R) Xeon(R) Gold 6154 CPU:
>>>>>> 10.2: PMU event map aliases :
>>>>>> --- start ---
>>>>>> test child forked, pid 962901
>>>>>> Using CPUID GenuineIntel-6-55-4
>>>>>
>>>>> Hi Jon,
>>>>>
>>>>> Sorry for the brief reply but I thought some quick hints might unblock
>>>>> you on this. The CPUID lines up with a SkylakeX:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
>>>>>
>>>>>> testing core PMU cpu aliases: pass
>>>>>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
>>>>>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
>>>>>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
>>>>>> (1) vs found (4)
>>>>>> test child finished with -1
>>>>>> ---- end ----
>>>>>> PMU events subtest 2: FAILED!
>>>>>>
>>>>>> Using Intel(R) Xeon(R) Platinum 8352Y:
>>>>>> 10.2: PMU event map aliases :
>>>>>> --- start ---
>>>>>> test child forked, pid 1765070
>>>>>> Using CPUID GenuineIntel-6-6A-6
>>>>>
>>>>> This is an IcelakeX:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
>>>>>
>>>>>> testing core PMU cpu aliases: pass
>>>>>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch
>>>>>> expected aliases (1) vs found (6)
>>>>>> test child finished with -1
>>>>>> ---- end ----
>>>>>> PMU events subtest 2: FAILED!
>>>>>>
>>>>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
>>>>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x
>>>>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the
>>>>>> uncore_imc_free_running_0 second case.
>>>>>>
>>>>>> # From 6154-based system:
>>>>>> ls -lhat /sys/devices/uncore_imc_0/events
>>>>>
>>>>> The "uncore_" prefix and the "_0" suffix are optional, the naming
>>>>> matching is case insensitive. In the event json the events are listed
>>>>> here:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
>>>>>
>>>>>> total 0
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
>>>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
>>>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
>>>>>
>>>>> This should be 3 sysfs events (I don't like the term alias), note that
>>>>> we load the sysfs and json events lazily to avoid overhead.
>>>>>
>>>>>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
>>>>>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
>>>>>>
>>>>>> # From the 8352Y-based system:
>>>>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
>>>>>> total 0
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
>>>>>
>>>>> This is 5 sysfs events, the json events are here:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
>>>>> Note, the "Unit", meaning the PMU should be imc_free_running to match
>>>>> this device.
>>>>>
>>>>>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
>>>>>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
>>>>>>
>>>>>> Looking at the structure of __test_uncore_pmu_event_aliases, however,
>>>>>> I'm not quite sure how this is supposed to work. I've annotated a walk
>>>>>> through below to highlight where things are going off the rails.
>>>>>>
>>>>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>>>>> {
>>>>>> ...
>>>>>> /* Count how many aliases we generated */
>>>>>> alias_count = perf_pmu__num_events(pmu);
>>>>>> // alias_count == 4 in the 6154-based system
>>>>>> // alias_count == 6 in the 8352Y-based system
>>>>>>
>>>>>> /* Count how many aliases we expect from the known table */
>>>>>> for (table = &test_pmu->aliases[0]; *table; table++)
>>>>>> to_match_count++;
>>>>>> // this is looking at aliases in struct perf_pmu_test_pmu
>>>>>> // table, which for uncore_imc_0 is a single entry for
>>>>>> // &uncore_imc_cache_hits.
>>>>>> //
>>>>>> // for the 8352Y case, likewise, we only have a single alias
>>>>>> // in the table for &uncore_imc_free_running_cache_miss.
>>>>>> //
>>>>>> // in both cases, to_match_count == 1
>>>>>>
>>>>>> // Compare 4 vs 1 or 6 vs 1
>>>>>> if (alias_count != to_match_count) {
>>>>>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
>>>>>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
>>>>>> return -1;
>>>>>> // we seemed doomed to hit this conditional always, no?
>>>>>> }
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> I did a walkthrough of the latest mainline code, and don't see a marked
>>>>>> difference that jump off the page to me that'd correct this behavior,
>>>>>> and would love a helping hand to point in the right direction on this.
>>>>>>
>>>>>> What am I missing here?
>>>>>
>>>>> I'll need some more time to dig into this. Hopefully the pointers above help.
>>>>
>>>> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
>>>> remains, as the code I posted to above seems to solely depend on the
>>>> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
>>>> dots connect from this test to the other events in sysfs/json’s.
>>>
>>> So looking at the test it is using the testcpu:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e=
>>> the json for that is here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e=
>>> The names in the test are based on ones seen on real CPUs, so this may
>>> be leading to the confusion.
>>
>> Hey Ian,
>> I was able to debug this a bit more. The following diff fixes this test on my system.
>>
>> Even though we were supposed to be using the test data only, the sysfs entries
>> from my systems, which happened to have similar names, threw a wrench in
>> this test.
>>
>> With this diff, we just use the JSON aliases that were added.
>>
>> Happy to send this out as a formal patch, but wanted to get the list’s 2cents
>> first, as I feel like I’m missing something :)
>>
>> Jon
>>
>> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
>> index f5321fbdee79..893dc7afee76 100644
>> --- a/tools/perf/tests/pmu-events.c
>> +++ b/tools/perf/tests/pmu-events.c
>> @@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>> const struct pmu_events_table *events_table;
>> int res = 0;
>>
>> + /* CPU events come from struct pmu_event pmu_events__test_soc_cpu
>> + * and sys events come from struct pmu_event pmu_events__test_soc_sys
>> + */
>> events_table = find_core_events_table("testarch", "testcpu");
>> if (!events_table)
>> return -1;
>> @@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>> pmu->sysfs_aliases_loaded = true;
>> pmu_add_sys_aliases(pmu);
>>
>> - /* Count how many aliases we generated */
>> - alias_count = perf_pmu__num_events(pmu);
>> + /* How many events we gathered for this PMU in test_soc.
>> + * Note: we specifically do not use perf_pmu__num_events as that may
>> + * include spurious system events from the system under test, which
>> + * may have similarly named PMUs.
>
> Thanks Jon, should we just rename the PMUs in the test json files? For
> example, rather than "CBO" here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu_uncore.json-3Fh-3Dperf-2Dtools-2Dnext-23n10&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=ColPGeQKTJvgSki3uEmVftry27ANS1v996w_qYNFC9oJe3CApdQ44in4Xn-DEJ-f&s=taxB1JgBs5_gfxc_Jo9emKkuiP70MY1hhm5KrSozfXQ&e=
> we can have "test_pmu1".
I think that could be a separate cleanup, sure, though I don’t
think that is super pressing. But certainly having it be different
names made this quite confusing to debug and grep thru the
source. I had to resort to the tried-n-true “just printk everything”
to figure out how this all tied together :)
I’ll send out my diff as a separate thread to the list in the mean
time.
Jon
>
> Thanks for investigating this!
> Ian
>
>> + */
>> + alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases;
>>
>> - /* Count how many aliases we expect from the known table */
>> + /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */
>> for (table = &test_pmu->aliases[0]; *table; table++)
>> to_match_count++;
>>
>>>
>>> Thanks,
>>> Ian
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-22 15:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-20 2:06 Perf test failures for 10.2 PMU event map aliases Jon Kohler
2024-08-20 5:41 ` Ian Rogers
2024-08-20 13:54 ` Jon Kohler
2024-08-20 15:17 ` Ian Rogers
2024-08-22 15:01 ` Jon Kohler
2024-08-22 15:15 ` Ian Rogers
2024-08-22 15:37 ` Jon Kohler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).