* Perf test failures for 10.2 PMU event map aliases @ 2024-08-20 2:06 Jon Kohler 2024-08-20 5:41 ` Ian Rogers 0 siblings, 1 reply; 7+ messages in thread From: Jon Kohler @ 2024-08-20 2:06 UTC (permalink / raw) To: irogers@google.com, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com Reaching out to the perf community for feedback on the following observed test failure. On 6.6.y, I see persistent failures with test 10.2 PMU event map aliases, complaining about testing aliases uncore PMU mismatches. I've included two outputs below, one with a bit of hacky print debugging. Using Intel(R) Xeon(R) Gold 6154 CPU: 10.2: PMU event map aliases : --- start --- test child forked, pid 962901 Using CPUID GenuineIntel-6-55-4 testing core PMU cpu aliases: pass JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 testing aliases uncore PMU uncore_imc_0: mismatch expected aliases (1) vs found (4) test child finished with -1 ---- end ---- PMU events subtest 2: FAILED! Using Intel(R) Xeon(R) Platinum 8352Y: 10.2: PMU event map aliases : --- start --- test child forked, pid 1765070 Using CPUID GenuineIntel-6-6A-6 testing core PMU cpu aliases: pass testing aliases uncore PMU uncore_imc_free_running_0: mismatch expected aliases (1) vs found (6) test child finished with -1 ---- end ---- PMU events subtest 2: FAILED! Digging in more, looking at pmu_aliases_parse, I see that we'll discard scale and unit files in pmu_alias_info_file, which leaves us with 3x aliases in the uncore_imc_0 in the first case and 5x aliases in the uncore_imc_free_running_0 second case. # From 6154-based system: ls -lhat /sys/devices/uncore_imc_0/events total 0 -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks drwxr-xr-x. 2 root root 0 Jul 17 03:40 . drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. # From the 8352Y-based system: ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events total 0 -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write -r--r--r--. 1 root root 4.0K Aug 19 21:33 read -r--r--r--. 1 root root 4.0K Aug 19 21:33 write drwxr-xr-x. 2 root root 0 Aug 15 03:15 . drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. Looking at the structure of __test_uncore_pmu_event_aliases, however, I'm not quite sure how this is supposed to work. I've annotated a walk through below to highlight where things are going off the rails. static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) { ... /* Count how many aliases we generated */ alias_count = perf_pmu__num_events(pmu); // alias_count == 4 in the 6154-based system // alias_count == 6 in the 8352Y-based system /* Count how many aliases we expect from the known table */ for (table = &test_pmu->aliases[0]; *table; table++) to_match_count++; // this is looking at aliases in struct perf_pmu_test_pmu // table, which for uncore_imc_0 is a single entry for // &uncore_imc_cache_hits. // // for the 8352Y case, likewise, we only have a single alias // in the table for &uncore_imc_free_running_cache_miss. // // in both cases, to_match_count == 1 // Compare 4 vs 1 or 6 vs 1 if (alias_count != to_match_count) { pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", pmu_name, to_match_count /* 1 */, alias_count /* 4 */); return -1; // we seemed doomed to hit this conditional always, no? } ... } I did a walkthrough of the latest mainline code, and don't see a marked difference that jump off the page to me that'd correct this behavior, and would love a helping hand to point in the right direction on this. What am I missing here? Thanks all, Jon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases 2024-08-20 2:06 Perf test failures for 10.2 PMU event map aliases Jon Kohler @ 2024-08-20 5:41 ` Ian Rogers 2024-08-20 13:54 ` Jon Kohler 0 siblings, 1 reply; 7+ messages in thread From: Ian Rogers @ 2024-08-20 5:41 UTC (permalink / raw) To: Jon Kohler Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote: > > Reaching out to the perf community for feedback on the following > observed test failure. On 6.6.y, I see persistent failures with test > 10.2 PMU event map aliases, complaining about testing aliases uncore > PMU mismatches. I've included two outputs below, one with a bit of > hacky print debugging. > > Using Intel(R) Xeon(R) Gold 6154 CPU: > 10.2: PMU event map aliases : > --- start --- > test child forked, pid 962901 > Using CPUID GenuineIntel-6-55-4 Hi Jon, Sorry for the brief reply but I thought some quick hints might unblock you on this. The CPUID lines up with a SkylakeX: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/mapfile.csv?h=perf-tools-next#n33 > testing core PMU cpu aliases: pass > JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 > JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 > testing aliases uncore PMU uncore_imc_0: mismatch expected aliases > (1) vs found (4) > test child finished with -1 > ---- end ---- > PMU events subtest 2: FAILED! > > Using Intel(R) Xeon(R) Platinum 8352Y: > 10.2: PMU event map aliases : > --- start --- > test child forked, pid 1765070 > Using CPUID GenuineIntel-6-6A-6 This is an IcelakeX: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/mapfile.csv?h=perf-tools-next#n18 > testing core PMU cpu aliases: pass > testing aliases uncore PMU uncore_imc_free_running_0: mismatch > expected aliases (1) vs found (6) > test child finished with -1 > ---- end ---- > PMU events subtest 2: FAILED! > > Digging in more, looking at pmu_aliases_parse, I see that we'll discard > scale and unit files in pmu_alias_info_file, which leaves us with 3x > aliases in the uncore_imc_0 in the first case and 5x aliases in the > uncore_imc_free_running_0 second case. > > # From 6154-based system: > ls -lhat /sys/devices/uncore_imc_0/events The "uncore_" prefix and the "_0" suffix are optional, the naming matching is case insensitive. In the event json the events are listed here: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json?h=perf-tools-next > total 0 > -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale > -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit > -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale > -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit > -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read > -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write > -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks This should be 3 sysfs events (I don't like the term alias), note that we load the sysfs and json events lazily to avoid overhead. > drwxr-xr-x. 2 root root 0 Jul 17 03:40 . > drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. > > # From the 8352Y-based system: > ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events > total 0 > -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale > -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit > -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale > -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit > -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale > -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit > -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale > -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit > -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk > -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read > -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write > -r--r--r--. 1 root root 4.0K Aug 19 21:33 read > -r--r--r--. 1 root root 4.0K Aug 19 21:33 write This is 5 sysfs events, the json events are here: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json?h=perf-tools-next#n134 Note, the "Unit", meaning the PMU should be imc_free_running to match this device. > drwxr-xr-x. 2 root root 0 Aug 15 03:15 . > drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. > > Looking at the structure of __test_uncore_pmu_event_aliases, however, > I'm not quite sure how this is supposed to work. I've annotated a walk > through below to highlight where things are going off the rails. > > static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) > { > ... > /* Count how many aliases we generated */ > alias_count = perf_pmu__num_events(pmu); > // alias_count == 4 in the 6154-based system > // alias_count == 6 in the 8352Y-based system > > /* Count how many aliases we expect from the known table */ > for (table = &test_pmu->aliases[0]; *table; table++) > to_match_count++; > // this is looking at aliases in struct perf_pmu_test_pmu > // table, which for uncore_imc_0 is a single entry for > // &uncore_imc_cache_hits. > // > // for the 8352Y case, likewise, we only have a single alias > // in the table for &uncore_imc_free_running_cache_miss. > // > // in both cases, to_match_count == 1 > > // Compare 4 vs 1 or 6 vs 1 > if (alias_count != to_match_count) { > pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", > pmu_name, to_match_count /* 1 */, alias_count /* 4 */); > return -1; > // we seemed doomed to hit this conditional always, no? > } > ... > } > > I did a walkthrough of the latest mainline code, and don't see a marked > difference that jump off the page to me that'd correct this behavior, > and would love a helping hand to point in the right direction on this. > > What am I missing here? I'll need some more time to dig into this. Hopefully the pointers above help. Thanks, Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases 2024-08-20 5:41 ` Ian Rogers @ 2024-08-20 13:54 ` Jon Kohler 2024-08-20 15:17 ` Ian Rogers 0 siblings, 1 reply; 7+ messages in thread From: Jon Kohler @ 2024-08-20 13:54 UTC (permalink / raw) To: Ian Rogers Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com > On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote: >> >> Reaching out to the perf community for feedback on the following >> observed test failure. On 6.6.y, I see persistent failures with test >> 10.2 PMU event map aliases, complaining about testing aliases uncore >> PMU mismatches. I've included two outputs below, one with a bit of >> hacky print debugging. >> >> Using Intel(R) Xeon(R) Gold 6154 CPU: >> 10.2: PMU event map aliases : >> --- start --- >> test child forked, pid 962901 >> Using CPUID GenuineIntel-6-55-4 > > Hi Jon, > > Sorry for the brief reply but I thought some quick hints might unblock > you on this. The CPUID lines up with a SkylakeX: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e= > >> testing core PMU cpu aliases: pass >> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 >> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 >> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases >> (1) vs found (4) >> test child finished with -1 >> ---- end ---- >> PMU events subtest 2: FAILED! >> >> Using Intel(R) Xeon(R) Platinum 8352Y: >> 10.2: PMU event map aliases : >> --- start --- >> test child forked, pid 1765070 >> Using CPUID GenuineIntel-6-6A-6 > > This is an IcelakeX: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e= > >> testing core PMU cpu aliases: pass >> testing aliases uncore PMU uncore_imc_free_running_0: mismatch >> expected aliases (1) vs found (6) >> test child finished with -1 >> ---- end ---- >> PMU events subtest 2: FAILED! >> >> Digging in more, looking at pmu_aliases_parse, I see that we'll discard >> scale and unit files in pmu_alias_info_file, which leaves us with 3x >> aliases in the uncore_imc_0 in the first case and 5x aliases in the >> uncore_imc_free_running_0 second case. >> >> # From 6154-based system: >> ls -lhat /sys/devices/uncore_imc_0/events > > The "uncore_" prefix and the "_0" suffix are optional, the naming > matching is case insensitive. In the event json the events are listed > here: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e= > >> total 0 >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks > > This should be 3 sysfs events (I don't like the term alias), note that > we load the sysfs and json events lazily to avoid overhead. > >> drwxr-xr-x. 2 root root 0 Jul 17 03:40 . >> drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. >> >> # From the 8352Y-based system: >> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events >> total 0 >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write > > This is 5 sysfs events, the json events are here: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e= > Note, the "Unit", meaning the PMU should be imc_free_running to match > this device. > >> drwxr-xr-x. 2 root root 0 Aug 15 03:15 . >> drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. >> >> Looking at the structure of __test_uncore_pmu_event_aliases, however, >> I'm not quite sure how this is supposed to work. I've annotated a walk >> through below to highlight where things are going off the rails. >> >> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) >> { >> ... >> /* Count how many aliases we generated */ >> alias_count = perf_pmu__num_events(pmu); >> // alias_count == 4 in the 6154-based system >> // alias_count == 6 in the 8352Y-based system >> >> /* Count how many aliases we expect from the known table */ >> for (table = &test_pmu->aliases[0]; *table; table++) >> to_match_count++; >> // this is looking at aliases in struct perf_pmu_test_pmu >> // table, which for uncore_imc_0 is a single entry for >> // &uncore_imc_cache_hits. >> // >> // for the 8352Y case, likewise, we only have a single alias >> // in the table for &uncore_imc_free_running_cache_miss. >> // >> // in both cases, to_match_count == 1 >> >> // Compare 4 vs 1 or 6 vs 1 >> if (alias_count != to_match_count) { >> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", >> pmu_name, to_match_count /* 1 */, alias_count /* 4 */); >> return -1; >> // we seemed doomed to hit this conditional always, no? >> } >> ... >> } >> >> I did a walkthrough of the latest mainline code, and don't see a marked >> difference that jump off the page to me that'd correct this behavior, >> and would love a helping hand to point in the right direction on this. >> >> What am I missing here? > > I'll need some more time to dig into this. Hopefully the pointers above help. Thanks for the quick reply and pointers, I appreciate it. The tricky bit still remains, as the code I posted to above seems to solely depend on the info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the dots connect from this test to the other events in sysfs/json’s. > > Thanks, > Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases 2024-08-20 13:54 ` Jon Kohler @ 2024-08-20 15:17 ` Ian Rogers 2024-08-22 15:01 ` Jon Kohler 0 siblings, 1 reply; 7+ messages in thread From: Ian Rogers @ 2024-08-20 15:17 UTC (permalink / raw) To: Jon Kohler Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote: > > > > > On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote: > > > > !-------------------------------------------------------------------| > > CAUTION: External Email > > > > |-------------------------------------------------------------------! > > > > On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote: > >> > >> Reaching out to the perf community for feedback on the following > >> observed test failure. On 6.6.y, I see persistent failures with test > >> 10.2 PMU event map aliases, complaining about testing aliases uncore > >> PMU mismatches. I've included two outputs below, one with a bit of > >> hacky print debugging. > >> > >> Using Intel(R) Xeon(R) Gold 6154 CPU: > >> 10.2: PMU event map aliases : > >> --- start --- > >> test child forked, pid 962901 > >> Using CPUID GenuineIntel-6-55-4 > > > > Hi Jon, > > > > Sorry for the brief reply but I thought some quick hints might unblock > > you on this. The CPUID lines up with a SkylakeX: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e= > > > >> testing core PMU cpu aliases: pass > >> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 > >> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 > >> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases > >> (1) vs found (4) > >> test child finished with -1 > >> ---- end ---- > >> PMU events subtest 2: FAILED! > >> > >> Using Intel(R) Xeon(R) Platinum 8352Y: > >> 10.2: PMU event map aliases : > >> --- start --- > >> test child forked, pid 1765070 > >> Using CPUID GenuineIntel-6-6A-6 > > > > This is an IcelakeX: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e= > > > >> testing core PMU cpu aliases: pass > >> testing aliases uncore PMU uncore_imc_free_running_0: mismatch > >> expected aliases (1) vs found (6) > >> test child finished with -1 > >> ---- end ---- > >> PMU events subtest 2: FAILED! > >> > >> Digging in more, looking at pmu_aliases_parse, I see that we'll discard > >> scale and unit files in pmu_alias_info_file, which leaves us with 3x > >> aliases in the uncore_imc_0 in the first case and 5x aliases in the > >> uncore_imc_free_running_0 second case. > >> > >> # From 6154-based system: > >> ls -lhat /sys/devices/uncore_imc_0/events > > > > The "uncore_" prefix and the "_0" suffix are optional, the naming > > matching is case insensitive. In the event json the events are listed > > here: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e= > > > >> total 0 > >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale > >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit > >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale > >> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit > >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read > >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write > >> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks > > > > This should be 3 sysfs events (I don't like the term alias), note that > > we load the sysfs and json events lazily to avoid overhead. > > > >> drwxr-xr-x. 2 root root 0 Jul 17 03:40 . > >> drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. > >> > >> # From the 8352Y-based system: > >> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events > >> total 0 > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale > >> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit > >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk > >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read > >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write > >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read > >> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write > > > > This is 5 sysfs events, the json events are here: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e= > > Note, the "Unit", meaning the PMU should be imc_free_running to match > > this device. > > > >> drwxr-xr-x. 2 root root 0 Aug 15 03:15 . > >> drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. > >> > >> Looking at the structure of __test_uncore_pmu_event_aliases, however, > >> I'm not quite sure how this is supposed to work. I've annotated a walk > >> through below to highlight where things are going off the rails. > >> > >> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) > >> { > >> ... > >> /* Count how many aliases we generated */ > >> alias_count = perf_pmu__num_events(pmu); > >> // alias_count == 4 in the 6154-based system > >> // alias_count == 6 in the 8352Y-based system > >> > >> /* Count how many aliases we expect from the known table */ > >> for (table = &test_pmu->aliases[0]; *table; table++) > >> to_match_count++; > >> // this is looking at aliases in struct perf_pmu_test_pmu > >> // table, which for uncore_imc_0 is a single entry for > >> // &uncore_imc_cache_hits. > >> // > >> // for the 8352Y case, likewise, we only have a single alias > >> // in the table for &uncore_imc_free_running_cache_miss. > >> // > >> // in both cases, to_match_count == 1 > >> > >> // Compare 4 vs 1 or 6 vs 1 > >> if (alias_count != to_match_count) { > >> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", > >> pmu_name, to_match_count /* 1 */, alias_count /* 4 */); > >> return -1; > >> // we seemed doomed to hit this conditional always, no? > >> } > >> ... > >> } > >> > >> I did a walkthrough of the latest mainline code, and don't see a marked > >> difference that jump off the page to me that'd correct this behavior, > >> and would love a helping hand to point in the right direction on this. > >> > >> What am I missing here? > > > > I'll need some more time to dig into this. Hopefully the pointers above help. > > Thanks for the quick reply and pointers, I appreciate it. The tricky bit still > remains, as the code I posted to above seems to solely depend on the > info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the > dots connect from this test to the other events in sysfs/json’s. So looking at the test it is using the testcpu: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/pmu-events.c?h=perf-tools-next#n602 the json for that is here: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/test/test_soc/cpu?h=perf-tools-next The names in the test are based on ones seen on real CPUs, so this may be leading to the confusion. Thanks, Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases 2024-08-20 15:17 ` Ian Rogers @ 2024-08-22 15:01 ` Jon Kohler 2024-08-22 15:15 ` Ian Rogers 0 siblings, 1 reply; 7+ messages in thread From: Jon Kohler @ 2024-08-22 15:01 UTC (permalink / raw) To: Ian Rogers Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com > On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@google.com> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote: >> >> >> >>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote: >>> >>> !-------------------------------------------------------------------| >>> CAUTION: External Email >>> >>> |-------------------------------------------------------------------! >>> >>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote: >>>> >>>> Reaching out to the perf community for feedback on the following >>>> observed test failure. On 6.6.y, I see persistent failures with test >>>> 10.2 PMU event map aliases, complaining about testing aliases uncore >>>> PMU mismatches. I've included two outputs below, one with a bit of >>>> hacky print debugging. >>>> >>>> Using Intel(R) Xeon(R) Gold 6154 CPU: >>>> 10.2: PMU event map aliases : >>>> --- start --- >>>> test child forked, pid 962901 >>>> Using CPUID GenuineIntel-6-55-4 >>> >>> Hi Jon, >>> >>> Sorry for the brief reply but I thought some quick hints might unblock >>> you on this. The CPUID lines up with a SkylakeX: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e= >>> >>>> testing core PMU cpu aliases: pass >>>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 >>>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 >>>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases >>>> (1) vs found (4) >>>> test child finished with -1 >>>> ---- end ---- >>>> PMU events subtest 2: FAILED! >>>> >>>> Using Intel(R) Xeon(R) Platinum 8352Y: >>>> 10.2: PMU event map aliases : >>>> --- start --- >>>> test child forked, pid 1765070 >>>> Using CPUID GenuineIntel-6-6A-6 >>> >>> This is an IcelakeX: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e= >>> >>>> testing core PMU cpu aliases: pass >>>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch >>>> expected aliases (1) vs found (6) >>>> test child finished with -1 >>>> ---- end ---- >>>> PMU events subtest 2: FAILED! >>>> >>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard >>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x >>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the >>>> uncore_imc_free_running_0 second case. >>>> >>>> # From 6154-based system: >>>> ls -lhat /sys/devices/uncore_imc_0/events >>> >>> The "uncore_" prefix and the "_0" suffix are optional, the naming >>> matching is case insensitive. In the event json the events are listed >>> here: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e= >>> >>>> total 0 >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks >>> >>> This should be 3 sysfs events (I don't like the term alias), note that >>> we load the sysfs and json events lazily to avoid overhead. >>> >>>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 . >>>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. >>>> >>>> # From the 8352Y-based system: >>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events >>>> total 0 >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write >>> >>> This is 5 sysfs events, the json events are here: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e= >>> Note, the "Unit", meaning the PMU should be imc_free_running to match >>> this device. >>> >>>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 . >>>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. >>>> >>>> Looking at the structure of __test_uncore_pmu_event_aliases, however, >>>> I'm not quite sure how this is supposed to work. I've annotated a walk >>>> through below to highlight where things are going off the rails. >>>> >>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) >>>> { >>>> ... >>>> /* Count how many aliases we generated */ >>>> alias_count = perf_pmu__num_events(pmu); >>>> // alias_count == 4 in the 6154-based system >>>> // alias_count == 6 in the 8352Y-based system >>>> >>>> /* Count how many aliases we expect from the known table */ >>>> for (table = &test_pmu->aliases[0]; *table; table++) >>>> to_match_count++; >>>> // this is looking at aliases in struct perf_pmu_test_pmu >>>> // table, which for uncore_imc_0 is a single entry for >>>> // &uncore_imc_cache_hits. >>>> // >>>> // for the 8352Y case, likewise, we only have a single alias >>>> // in the table for &uncore_imc_free_running_cache_miss. >>>> // >>>> // in both cases, to_match_count == 1 >>>> >>>> // Compare 4 vs 1 or 6 vs 1 >>>> if (alias_count != to_match_count) { >>>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", >>>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */); >>>> return -1; >>>> // we seemed doomed to hit this conditional always, no? >>>> } >>>> ... >>>> } >>>> >>>> I did a walkthrough of the latest mainline code, and don't see a marked >>>> difference that jump off the page to me that'd correct this behavior, >>>> and would love a helping hand to point in the right direction on this. >>>> >>>> What am I missing here? >>> >>> I'll need some more time to dig into this. Hopefully the pointers above help. >> >> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still >> remains, as the code I posted to above seems to solely depend on the >> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the >> dots connect from this test to the other events in sysfs/json’s. > > So looking at the test it is using the testcpu: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e= > the json for that is here: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e= > The names in the test are based on ones seen on real CPUs, so this may > be leading to the confusion. Hey Ian, I was able to debug this a bit more. The following diff fixes this test on my system. Even though we were supposed to be using the test data only, the sysfs entries from my systems, which happened to have similar names, threw a wrench in this test. With this diff, we just use the JSON aliases that were added. Happy to send this out as a formal patch, but wanted to get the list’s 2cents first, as I feel like I’m missing something :) Jon diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c index f5321fbdee79..893dc7afee76 100644 --- a/tools/perf/tests/pmu-events.c +++ b/tools/perf/tests/pmu-events.c @@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) const struct pmu_events_table *events_table; int res = 0; + /* CPU events come from struct pmu_event pmu_events__test_soc_cpu + * and sys events come from struct pmu_event pmu_events__test_soc_sys + */ events_table = find_core_events_table("testarch", "testcpu"); if (!events_table) return -1; @@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) pmu->sysfs_aliases_loaded = true; pmu_add_sys_aliases(pmu); - /* Count how many aliases we generated */ - alias_count = perf_pmu__num_events(pmu); + /* How many events we gathered for this PMU in test_soc. + * Note: we specifically do not use perf_pmu__num_events as that may + * include spurious system events from the system under test, which + * may have similarly named PMUs. + */ + alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases; - /* Count how many aliases we expect from the known table */ + /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */ for (table = &test_pmu->aliases[0]; *table; table++) to_match_count++; > > Thanks, > Ian ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases 2024-08-22 15:01 ` Jon Kohler @ 2024-08-22 15:15 ` Ian Rogers 2024-08-22 15:37 ` Jon Kohler 0 siblings, 1 reply; 7+ messages in thread From: Ian Rogers @ 2024-08-22 15:15 UTC (permalink / raw) To: Jon Kohler Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com On Thu, Aug 22, 2024 at 8:01 AM Jon Kohler <jon@nutanix.com> wrote: > > > > > On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@google.com> wrote: > > > > !-------------------------------------------------------------------| > > CAUTION: External Email > > > > |-------------------------------------------------------------------! > > > > On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote: > >> > >> > >> > >>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote: > >>> > >>> !-------------------------------------------------------------------| > >>> CAUTION: External Email > >>> > >>> |-------------------------------------------------------------------! > >>> > >>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote: > >>>> > >>>> Reaching out to the perf community for feedback on the following > >>>> observed test failure. On 6.6.y, I see persistent failures with test > >>>> 10.2 PMU event map aliases, complaining about testing aliases uncore > >>>> PMU mismatches. I've included two outputs below, one with a bit of > >>>> hacky print debugging. > >>>> > >>>> Using Intel(R) Xeon(R) Gold 6154 CPU: > >>>> 10.2: PMU event map aliases : > >>>> --- start --- > >>>> test child forked, pid 962901 > >>>> Using CPUID GenuineIntel-6-55-4 > >>> > >>> Hi Jon, > >>> > >>> Sorry for the brief reply but I thought some quick hints might unblock > >>> you on this. The CPUID lines up with a SkylakeX: > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e= > >>> > >>>> testing core PMU cpu aliases: pass > >>>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 > >>>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 > >>>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases > >>>> (1) vs found (4) > >>>> test child finished with -1 > >>>> ---- end ---- > >>>> PMU events subtest 2: FAILED! > >>>> > >>>> Using Intel(R) Xeon(R) Platinum 8352Y: > >>>> 10.2: PMU event map aliases : > >>>> --- start --- > >>>> test child forked, pid 1765070 > >>>> Using CPUID GenuineIntel-6-6A-6 > >>> > >>> This is an IcelakeX: > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e= > >>> > >>>> testing core PMU cpu aliases: pass > >>>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch > >>>> expected aliases (1) vs found (6) > >>>> test child finished with -1 > >>>> ---- end ---- > >>>> PMU events subtest 2: FAILED! > >>>> > >>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard > >>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x > >>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the > >>>> uncore_imc_free_running_0 second case. > >>>> > >>>> # From 6154-based system: > >>>> ls -lhat /sys/devices/uncore_imc_0/events > >>> > >>> The "uncore_" prefix and the "_0" suffix are optional, the naming > >>> matching is case insensitive. In the event json the events are listed > >>> here: > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e= > >>> > >>>> total 0 > >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale > >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit > >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale > >>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit > >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read > >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write > >>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks > >>> > >>> This should be 3 sysfs events (I don't like the term alias), note that > >>> we load the sysfs and json events lazily to avoid overhead. > >>> > >>>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 . > >>>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. > >>>> > >>>> # From the 8352Y-based system: > >>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events > >>>> total 0 > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale > >>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit > >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk > >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read > >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write > >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read > >>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write > >>> > >>> This is 5 sysfs events, the json events are here: > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e= > >>> Note, the "Unit", meaning the PMU should be imc_free_running to match > >>> this device. > >>> > >>>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 . > >>>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. > >>>> > >>>> Looking at the structure of __test_uncore_pmu_event_aliases, however, > >>>> I'm not quite sure how this is supposed to work. I've annotated a walk > >>>> through below to highlight where things are going off the rails. > >>>> > >>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) > >>>> { > >>>> ... > >>>> /* Count how many aliases we generated */ > >>>> alias_count = perf_pmu__num_events(pmu); > >>>> // alias_count == 4 in the 6154-based system > >>>> // alias_count == 6 in the 8352Y-based system > >>>> > >>>> /* Count how many aliases we expect from the known table */ > >>>> for (table = &test_pmu->aliases[0]; *table; table++) > >>>> to_match_count++; > >>>> // this is looking at aliases in struct perf_pmu_test_pmu > >>>> // table, which for uncore_imc_0 is a single entry for > >>>> // &uncore_imc_cache_hits. > >>>> // > >>>> // for the 8352Y case, likewise, we only have a single alias > >>>> // in the table for &uncore_imc_free_running_cache_miss. > >>>> // > >>>> // in both cases, to_match_count == 1 > >>>> > >>>> // Compare 4 vs 1 or 6 vs 1 > >>>> if (alias_count != to_match_count) { > >>>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", > >>>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */); > >>>> return -1; > >>>> // we seemed doomed to hit this conditional always, no? > >>>> } > >>>> ... > >>>> } > >>>> > >>>> I did a walkthrough of the latest mainline code, and don't see a marked > >>>> difference that jump off the page to me that'd correct this behavior, > >>>> and would love a helping hand to point in the right direction on this. > >>>> > >>>> What am I missing here? > >>> > >>> I'll need some more time to dig into this. Hopefully the pointers above help. > >> > >> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still > >> remains, as the code I posted to above seems to solely depend on the > >> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the > >> dots connect from this test to the other events in sysfs/json’s. > > > > So looking at the test it is using the testcpu: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e= > > the json for that is here: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e= > > The names in the test are based on ones seen on real CPUs, so this may > > be leading to the confusion. > > Hey Ian, > I was able to debug this a bit more. The following diff fixes this test on my system. > > Even though we were supposed to be using the test data only, the sysfs entries > from my systems, which happened to have similar names, threw a wrench in > this test. > > With this diff, we just use the JSON aliases that were added. > > Happy to send this out as a formal patch, but wanted to get the list’s 2cents > first, as I feel like I’m missing something :) > > Jon > > diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c > index f5321fbdee79..893dc7afee76 100644 > --- a/tools/perf/tests/pmu-events.c > +++ b/tools/perf/tests/pmu-events.c > @@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) > const struct pmu_events_table *events_table; > int res = 0; > > + /* CPU events come from struct pmu_event pmu_events__test_soc_cpu > + * and sys events come from struct pmu_event pmu_events__test_soc_sys > + */ > events_table = find_core_events_table("testarch", "testcpu"); > if (!events_table) > return -1; > @@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) > pmu->sysfs_aliases_loaded = true; > pmu_add_sys_aliases(pmu); > > - /* Count how many aliases we generated */ > - alias_count = perf_pmu__num_events(pmu); > + /* How many events we gathered for this PMU in test_soc. > + * Note: we specifically do not use perf_pmu__num_events as that may > + * include spurious system events from the system under test, which > + * may have similarly named PMUs. Thanks Jon, should we just rename the PMUs in the test json files? For example, rather than "CBO" here: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/test/test_soc/cpu/uncore.json?h=perf-tools-next#n10 we can have "test_pmu1". Thanks for investigating this! Ian > + */ > + alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases; > > - /* Count how many aliases we expect from the known table */ > + /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */ > for (table = &test_pmu->aliases[0]; *table; table++) > to_match_count++; > > > > > Thanks, > > Ian > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Perf test failures for 10.2 PMU event map aliases 2024-08-22 15:15 ` Ian Rogers @ 2024-08-22 15:37 ` Jon Kohler 0 siblings, 0 replies; 7+ messages in thread From: Jon Kohler @ 2024-08-22 15:37 UTC (permalink / raw) To: Ian Rogers Cc: adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, LKML, Kan Liang, alexander.shishkin@linux.intel.com > On Aug 22, 2024, at 11:15 AM, Ian Rogers <irogers@google.com> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Thu, Aug 22, 2024 at 8:01 AM Jon Kohler <jon@nutanix.com> wrote: >> >> >> >>> On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@google.com> wrote: >>> >>> !-------------------------------------------------------------------| >>> CAUTION: External Email >>> >>> |-------------------------------------------------------------------! >>> >>> On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@nutanix.com> wrote: >>>> >>>> >>>> >>>>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@google.com> wrote: >>>>> >>>>> !-------------------------------------------------------------------| >>>>> CAUTION: External Email >>>>> >>>>> |-------------------------------------------------------------------! >>>>> >>>>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@nutanix.com> wrote: >>>>>> >>>>>> Reaching out to the perf community for feedback on the following >>>>>> observed test failure. On 6.6.y, I see persistent failures with test >>>>>> 10.2 PMU event map aliases, complaining about testing aliases uncore >>>>>> PMU mismatches. I've included two outputs below, one with a bit of >>>>>> hacky print debugging. >>>>>> >>>>>> Using Intel(R) Xeon(R) Gold 6154 CPU: >>>>>> 10.2: PMU event map aliases : >>>>>> --- start --- >>>>>> test child forked, pid 962901 >>>>>> Using CPUID GenuineIntel-6-55-4 >>>>> >>>>> Hi Jon, >>>>> >>>>> Sorry for the brief reply but I thought some quick hints might unblock >>>>> you on this. The CPUID lines up with a SkylakeX: >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e= >>>>> >>>>>> testing core PMU cpu aliases: pass >>>>>> JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0 >>>>>> JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1 >>>>>> testing aliases uncore PMU uncore_imc_0: mismatch expected aliases >>>>>> (1) vs found (4) >>>>>> test child finished with -1 >>>>>> ---- end ---- >>>>>> PMU events subtest 2: FAILED! >>>>>> >>>>>> Using Intel(R) Xeon(R) Platinum 8352Y: >>>>>> 10.2: PMU event map aliases : >>>>>> --- start --- >>>>>> test child forked, pid 1765070 >>>>>> Using CPUID GenuineIntel-6-6A-6 >>>>> >>>>> This is an IcelakeX: >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e= >>>>> >>>>>> testing core PMU cpu aliases: pass >>>>>> testing aliases uncore PMU uncore_imc_free_running_0: mismatch >>>>>> expected aliases (1) vs found (6) >>>>>> test child finished with -1 >>>>>> ---- end ---- >>>>>> PMU events subtest 2: FAILED! >>>>>> >>>>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard >>>>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x >>>>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the >>>>>> uncore_imc_free_running_0 second case. >>>>>> >>>>>> # From 6154-based system: >>>>>> ls -lhat /sys/devices/uncore_imc_0/events >>>>> >>>>> The "uncore_" prefix and the "_0" suffix are optional, the naming >>>>> matching is case insensitive. In the event json the events are listed >>>>> here: >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e= >>>>> >>>>>> total 0 >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit >>>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read >>>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write >>>>>> -r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks >>>>> >>>>> This should be 3 sysfs events (I don't like the term alias), note that >>>>> we load the sysfs and json events lazily to avoid overhead. >>>>> >>>>>> drwxr-xr-x. 2 root root 0 Jul 17 03:40 . >>>>>> drwxr-xr-x. 5 root root 0 Jul 17 02:52 .. >>>>>> >>>>>> # From the 8352Y-based system: >>>>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events >>>>>> total 0 >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale >>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read >>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write >>>>> >>>>> This is 5 sysfs events, the json events are here: >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e= >>>>> Note, the "Unit", meaning the PMU should be imc_free_running to match >>>>> this device. >>>>> >>>>>> drwxr-xr-x. 2 root root 0 Aug 15 03:15 . >>>>>> drwxr-xr-x. 5 root root 0 Aug 15 02:42 .. >>>>>> >>>>>> Looking at the structure of __test_uncore_pmu_event_aliases, however, >>>>>> I'm not quite sure how this is supposed to work. I've annotated a walk >>>>>> through below to highlight where things are going off the rails. >>>>>> >>>>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) >>>>>> { >>>>>> ... >>>>>> /* Count how many aliases we generated */ >>>>>> alias_count = perf_pmu__num_events(pmu); >>>>>> // alias_count == 4 in the 6154-based system >>>>>> // alias_count == 6 in the 8352Y-based system >>>>>> >>>>>> /* Count how many aliases we expect from the known table */ >>>>>> for (table = &test_pmu->aliases[0]; *table; table++) >>>>>> to_match_count++; >>>>>> // this is looking at aliases in struct perf_pmu_test_pmu >>>>>> // table, which for uncore_imc_0 is a single entry for >>>>>> // &uncore_imc_cache_hits. >>>>>> // >>>>>> // for the 8352Y case, likewise, we only have a single alias >>>>>> // in the table for &uncore_imc_free_running_cache_miss. >>>>>> // >>>>>> // in both cases, to_match_count == 1 >>>>>> >>>>>> // Compare 4 vs 1 or 6 vs 1 >>>>>> if (alias_count != to_match_count) { >>>>>> pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n", >>>>>> pmu_name, to_match_count /* 1 */, alias_count /* 4 */); >>>>>> return -1; >>>>>> // we seemed doomed to hit this conditional always, no? >>>>>> } >>>>>> ... >>>>>> } >>>>>> >>>>>> I did a walkthrough of the latest mainline code, and don't see a marked >>>>>> difference that jump off the page to me that'd correct this behavior, >>>>>> and would love a helping hand to point in the right direction on this. >>>>>> >>>>>> What am I missing here? >>>>> >>>>> I'll need some more time to dig into this. Hopefully the pointers above help. >>>> >>>> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still >>>> remains, as the code I posted to above seems to solely depend on the >>>> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the >>>> dots connect from this test to the other events in sysfs/json’s. >>> >>> So looking at the test it is using the testcpu: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e= >>> the json for that is here: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e= >>> The names in the test are based on ones seen on real CPUs, so this may >>> be leading to the confusion. >> >> Hey Ian, >> I was able to debug this a bit more. The following diff fixes this test on my system. >> >> Even though we were supposed to be using the test data only, the sysfs entries >> from my systems, which happened to have similar names, threw a wrench in >> this test. >> >> With this diff, we just use the JSON aliases that were added. >> >> Happy to send this out as a formal patch, but wanted to get the list’s 2cents >> first, as I feel like I’m missing something :) >> >> Jon >> >> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c >> index f5321fbdee79..893dc7afee76 100644 >> --- a/tools/perf/tests/pmu-events.c >> +++ b/tools/perf/tests/pmu-events.c >> @@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) >> const struct pmu_events_table *events_table; >> int res = 0; >> >> + /* CPU events come from struct pmu_event pmu_events__test_soc_cpu >> + * and sys events come from struct pmu_event pmu_events__test_soc_sys >> + */ >> events_table = find_core_events_table("testarch", "testcpu"); >> if (!events_table) >> return -1; >> @@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu) >> pmu->sysfs_aliases_loaded = true; >> pmu_add_sys_aliases(pmu); >> >> - /* Count how many aliases we generated */ >> - alias_count = perf_pmu__num_events(pmu); >> + /* How many events we gathered for this PMU in test_soc. >> + * Note: we specifically do not use perf_pmu__num_events as that may >> + * include spurious system events from the system under test, which >> + * may have similarly named PMUs. > > Thanks Jon, should we just rename the PMUs in the test json files? For > example, rather than "CBO" here: > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu_uncore.json-3Fh-3Dperf-2Dtools-2Dnext-23n10&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=ColPGeQKTJvgSki3uEmVftry27ANS1v996w_qYNFC9oJe3CApdQ44in4Xn-DEJ-f&s=taxB1JgBs5_gfxc_Jo9emKkuiP70MY1hhm5KrSozfXQ&e= > we can have "test_pmu1". I think that could be a separate cleanup, sure, though I don’t think that is super pressing. But certainly having it be different names made this quite confusing to debug and grep thru the source. I had to resort to the tried-n-true “just printk everything” to figure out how this all tied together :) I’ll send out my diff as a separate thread to the list in the mean time. Jon > > Thanks for investigating this! > Ian > >> + */ >> + alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases; >> >> - /* Count how many aliases we expect from the known table */ >> + /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */ >> for (table = &test_pmu->aliases[0]; *table; table++) >> to_match_count++; >> >>> >>> Thanks, >>> Ian >> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-22 15:37 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-20 2:06 Perf test failures for 10.2 PMU event map aliases Jon Kohler 2024-08-20 5:41 ` Ian Rogers 2024-08-20 13:54 ` Jon Kohler 2024-08-20 15:17 ` Ian Rogers 2024-08-22 15:01 ` Jon Kohler 2024-08-22 15:15 ` Ian Rogers 2024-08-22 15:37 ` Jon Kohler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).