* Perf stat regression from d15480a3d67 @ 2023-06-22 1:39 Andi Kleen 2023-06-22 6:10 ` Ian Rogers 0 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2023-06-22 1:39 UTC (permalink / raw) To: namhyung, linux-perf-users, acme Hi Namhyung, I have a tool (toplev in pmu-tools) that parses the perf stat output for generated command lines. It expects that the output contains the events in the same order as specified with -e. Unfortunately this broke for uncore events. I bisected it down to the following commit. I believe just arbitrarily hiding events will break any tool that parses the perf stat output and expect the input order. Can the original problem be fixed without hiding events? Thanks, -Andi commit dd15480a3d67b9cf04a1f6f5d60f1c0dc018e22f Author: Namhyung Kim <namhyung@kernel.org> Date: Wed Jan 25 11:24:31 2023 -0800 perf stat: Hide invalid uncore event output for aggr mode The current display code for perf stat iterates given cpus and build the aggr map to collect the event data for the aggregation mode. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-22 1:39 Perf stat regression from d15480a3d67 Andi Kleen @ 2023-06-22 6:10 ` Ian Rogers 2023-06-22 9:20 ` Andi Kleen 0 siblings, 1 reply; 15+ messages in thread From: Ian Rogers @ 2023-06-22 6:10 UTC (permalink / raw) To: Andi Kleen; +Cc: namhyung, linux-perf-users, acme On Wed, Jun 21, 2023 at 6:39 PM Andi Kleen <ak@linux.intel.com> wrote: > > Hi Namhyung, > > I have a tool (toplev in pmu-tools) that parses the perf stat output > for generated command lines. It expects that the output contains the > events in the same order as specified with -e. > > Unfortunately this broke for uncore events. > I bisected it down to the following commit. > > I believe just arbitrarily hiding events will break any tool that parses > the perf stat output and expect the input order. > > Can the original problem be fixed without hiding events? Hi Andi, Can you provide an example that is breaking? Wrt event order, cases like: ``` $ sudo perf stat -e '{data_read,data_write}' -a --no-merge sleep 1 Performance counter stats for 'system wide': 1,960.81 MiB data_read [uncore_imc_free_running_1] 438.48 MiB data_write [uncore_imc_free_running_1] 1,961.85 MiB data_read [uncore_imc_free_running_0] 438.79 MiB data_write [uncore_imc_free_running_0] 1.001127356 seconds time elapsed ``` Reorder events so that the PMUs match (ie the -e order would 2x data_read followed by 2x data_write). This covers more cases now so that perf metric (aka topdown) events aren't broken when coming out of metrics: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2139 Given the original commit message there seems little point to give the 0 counts, but perhaps it is something that could be done on the more tool oriented CSV and JSON output formats. Thanks, Ian > Thanks, > > -Andi > > commit dd15480a3d67b9cf04a1f6f5d60f1c0dc018e22f > Author: Namhyung Kim <namhyung@kernel.org> > Date: Wed Jan 25 11:24:31 2023 -0800 > > perf stat: Hide invalid uncore event output for aggr mode > > The current display code for perf stat iterates given cpus and build the > aggr map to collect the event data for the aggregation mode. > > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-22 6:10 ` Ian Rogers @ 2023-06-22 9:20 ` Andi Kleen 2023-06-22 13:41 ` Ian Rogers 2023-06-22 19:09 ` Namhyung Kim 0 siblings, 2 replies; 15+ messages in thread From: Andi Kleen @ 2023-06-22 9:20 UTC (permalink / raw) To: Ian Rogers; +Cc: namhyung, linux-perf-users, acme > Can you provide an example that is breaking? I only have a very long auto generated example[1], because it's not easy to make a non scheduling event and it's also dependent on the system. The example works on a SKX. > > Wrt event order, cases like: > ``` > $ sudo perf stat -e '{data_read,data_write}' -a --no-merge sleep 1 > > Performance counter stats for 'system wide': > > 1,960.81 MiB data_read [uncore_imc_free_running_1] > 438.48 MiB data_write [uncore_imc_free_running_1] > 1,961.85 MiB data_read [uncore_imc_free_running_0] > 438.79 MiB data_write [uncore_imc_free_running_0] > > 1.001127356 seconds time elapsed > ``` > Reorder events so that the PMUs match (ie the -e order would 2x > data_read followed by 2x data_write). This covers more cases now so > that perf metric (aka topdown) events aren't broken when coming out of > metrics: > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2139 How is any tool supposed to handle arbitary output reordering like that? Perf is not just for humans, but also for tools. It cannot just look up the events by name because there might be multiple instances of the same event in different groups. The only reliable way to match is to match on the original order, but that totally breaks down with these changes. The same actually applies to humans reading the output too, if it's ambigious for the tool it will be ambigious to the human too. perf stat shouldn't reorder events ever. > > Given the original commit message there seems little point to give the > 0 counts, but perhaps it is something that could be done on the more > tool oriented CSV and JSON output formats. It's not 0 counts, it's <not supported>. But even 0 counts can be important. We should always output when something doesn't count so that the user knows something is wrong. Hiding problems is always a bad idea and not an improvement. -Andi [1] Long example for SKX: perf stat -o p3 -x\; -e '{cpu/event=0xc0,umask=0x0/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x0,umask=0x3/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0xb1,umask=0x2,cmask=1/u},msr/tsc/,duration_time,dummy,uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cs:u,minor-faults:u,major-faults:u,migrations:u,{cycles:u,cpu/event=0x0,umask=0x3/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0xc4,umask=0x40/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xc2,umask=0x2,cmask=1/u,cpu/event=0x3c,umask=0x0/ku,cpu/event=0xc0,umask=0x0/ku},{cpu/event=0x79,umask=0x8/u,cpu/event=0xa8,umask=0x1/u,cpu/event=0x79,umask=0x4/u,cpu/event=0x79,umask=0x30/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u},{cpu/event=0x11,umask=0x2/u,cpu/event=0x11,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x3c,umask=0x0/ku,cpu/event=0x3c,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xc2,umask=0x2/u},{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0x9c,umask=0x1/u,cpu/event=0xc0,umask=0x0/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xd,umask=0x3,cmask=1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cpu/event=0x79,umask=0x30/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xc5,umask=0x0/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0x3c,umask=0x1/u},dummy,{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},dummy,{cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u},{cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x9c,umask=0x1,cmask=4/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u},dummy,dummy,emulation-faults,dummy,{cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u},{cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x9c,umask=0x1,cmask=4/u},{cpu/event=0x79,umask=0x30,edge=1,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x87,umask=0x1/u,cpu/event=0xab,umask=0x2/u,cpu/event=0xa2,umask=0x8/u},{cpu/event=0x85,umask=0x10/u,cpu/event=0x85,umask=0x4/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x8,umask=0x10/u,cpu/event=0x8,umask=0x4/u},{cpu/event=0xc5,umask=0x0/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xe6,umask=0x1f/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0xd1,umask=0x4/u,cpu/event=0xd1,umask=0x20/u,cpu/event=0xa3,umask=0x5,cmask=5/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x14,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0x11,umask=0x2/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cpu/event=0x79,umask=0x30/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x10,umask=0x1/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0x10,umask=0x20/u},{cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x11,umask=0x1/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u},dummy,{cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u,cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u},{cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x60,umask=0x8,cmask=6/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x60,umask=0x8,cmask=1/u,cpu/event=0x60,umask=0x8,cmask=6/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x10,umask=0x1/u,cpu/event=0xb1,umask=0x1/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0x10,umask=0x10/u},emulation-faults,{cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x11,umask=0x1/u,cpu/event=0x11,umask=0x2/u},emulation-faults,{cpu/event=0x11,umask=0x2/u,cpu/event=0x11,umask=0x1/u,cpu/event=0xb1,umask=0x1/u}' ~/pmu/pmu-tools/workloads/BC1s grep uncore p3 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-22 9:20 ` Andi Kleen @ 2023-06-22 13:41 ` Ian Rogers 2023-06-22 19:09 ` Namhyung Kim 1 sibling, 0 replies; 15+ messages in thread From: Ian Rogers @ 2023-06-22 13:41 UTC (permalink / raw) To: Andi Kleen; +Cc: namhyung, linux-perf-users, acme On Thu, Jun 22, 2023 at 2:20 AM Andi Kleen <ak@linux.intel.com> wrote: > > > Can you provide an example that is breaking? > > I only have a very long auto generated example[1], because it's not > easy to make a non scheduling event and it's also dependent on the system. The example > works on a SKX. Thanks, I have an SKX so I can try to repro. > > > > Wrt event order, cases like: > > ``` > > $ sudo perf stat -e '{data_read,data_write}' -a --no-merge sleep 1 > > > > Performance counter stats for 'system wide': > > > > 1,960.81 MiB data_read [uncore_imc_free_running_1] > > 438.48 MiB data_write [uncore_imc_free_running_1] > > 1,961.85 MiB data_read [uncore_imc_free_running_0] > > 438.79 MiB data_write [uncore_imc_free_running_0] > > > > 1.001127356 seconds time elapsed > > ``` > > Reorder events so that the PMUs match (ie the -e order would 2x > > data_read followed by 2x data_write). This covers more cases now so > > that perf metric (aka topdown) events aren't broken when coming out of > > metrics: > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2139 > > How is any tool supposed to handle arbitary output reordering like > that? Perf is not just for humans, but also for tools. > > It cannot just look up the events by name because there might be multiple > instances of the same event in different groups. > > The only reliable way to match is to match on the original order, > but that totally breaks down with these changes. Did you check the code/changes? The order that is preferred is the original insertion order into the evlist. There is a warning triggered when regrouping occurs. It shouldn't occur if a tool is careful, avoids wildcard event matching within a group, etc. Having different PMUs within a group would be broken so this has always been historically fixed but previously just for uncore events. With hybrid, core events can now open on >1 PMU and so we need to fix all cases. This introduces a problem like, "perf stat -e '{cycles,faults}' ..." where cycles will open on 2 PMUs but faults being a software event doesn't get wildcard expanded so is only grouped with one. There are still improvements needed in this area but failing the parsing and requiring every PMU be listed would be quite hostile to users. Another issue is that kernel PMUs don't always fail perf_event_open for a group due to not having enough information in the fake PMU used for testing. I've asked for kernel fixes but was told that the tool should handle these. Because of this we need to force metrics not to group events but then that breaks the perf metric (aka topdown) events that require a group. The perf metric events are also super fussy with needing slots to be a leader and so on, so this is handled by parse_events__sort_events_and_fix_groups. > The same actually applies to humans reading the output too, > if it's ambigious for the tool it will be ambigious to the human too. > perf stat shouldn't reorder events ever. I don't believe there is a deliberate attempt to hide information, the change was made so that when looking at say per-core information an uncore count doesn't get repeated making a counter look broken: https://lore.kernel.org/r/20230125192431.2929677-1-namhyung@kernel.org > > > > Given the original commit message there seems little point to give the > > 0 counts, but perhaps it is something that could be done on the more > > tool oriented CSV and JSON output formats. > > It's not 0 counts, it's <not supported>. But even 0 counts can > be important. > > We should always output when something doesn't count so that > the user knows something is wrong. Hiding problems is always a bad idea > and not an improvement. I don't think there's any disagreement on that. In fact the perf-tools-next code will now always display a metric result even if a divide by 0 happened during computation, or on hybrid if an event doesn't apply to one PMU the other PMU will show 0/not counted/not supported, the previous behavior was to hide hybrid events that only worked on 1 PMU. The original patch series tried to clean up the aggregation logic, but then that regressed output and so should_skip_zero_counter was added. The check is pretty conservative, preferring to display 0s. I wonder in your case it is hiding it because there are multiple sockets. Getting a simpler test will help diagnose the problem and come up with the appropriate fix. Thanks, Ian > -Andi > > [1] Long example for SKX: > > perf stat -o p3 -x\; -e '{cpu/event=0xc0,umask=0x0/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x0,umask=0x3/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0xb1,umask=0x2,cmask=1/u},msr/tsc/,duration_time,dummy,uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cs:u,minor-faults:u,major-faults:u,migrations:u,{cycles:u,cpu/event=0x0,umask=0x3/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0xc4,umask=0x40/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xc2,umask=0x2,cmask=1/u,cpu/event=0x3c,umask=0x0/ku,cpu/event=0xc0,umask=0x0/ku},{cpu/event=0x79,umask=0x8/u,cpu/event=0xa8,umask=0x1/u,cpu/event=0x79,umask=0x4/u,cpu/event=0x79,umask=0x30/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u},{cpu/event=0x11,umask=0x2/u,cpu/event=0x11,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x3c,umask=0x0/ku,cpu/event=0x3c,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xc2,umask=0x2/u},{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0x9c,umask=0x1/u,cpu/event=0xc0,umask=0x0/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xd,umask=0x3,cmask=1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cpu/event=0x79,umask=0x30/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xc5,umask=0x0/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0x3c,umask=0x1/u},dummy,{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},dummy,{cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u},{cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x9c,umask=0x1,cmask=4/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u},dummy,dummy,emulation-faults,dummy,{cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u},{cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x9c,umask=0x1,cmask=4/u},{cpu/event=0x79,umask=0x30,edge=1,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x87,umask=0x1/u,cpu/event=0xab,umask=0x2/u,cpu/event=0xa2,umask=0x8/u},{cpu/event=0x85,umask=0x10/u,cpu/event=0x85,umask=0x4/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x8,umask=0x10/u,cpu/event=0x8,umask=0x4/u},{cpu/event=0xc5,umask=0x0/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xe6,umask=0x1f/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0xd1,umask=0x4/u,cpu/event=0xd1,umask=0x20/u,cpu/event=0xa3,umask=0x5,cmask=5/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x14,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0x11,umask=0x2/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cpu/event=0x79,umask=0x30/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x10,umask=0x1/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0x10,umask=0x20/u},{cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x11,umask=0x1/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u},dummy,{cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u,cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u},{cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x60,umask=0x8,cmask=6/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x60,umask=0x8,cmask=1/u,cpu/event=0x60,umask=0x8,cmask=6/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x10,umask=0x1/u,cpu/event=0xb1,umask=0x1/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0x10,umask=0x10/u},emulation-faults,{cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x11,umask=0x1/u,cpu/event=0x11,umask=0x2/u},emulation-faults,{cpu/event=0x11,umask=0x2/u,cpu/event=0x11,umask=0x1/u,cpu/event=0xb1,umask=0x1/u}' ~/pmu/pmu-tools/workloads/BC1s > grep uncore p3 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-22 9:20 ` Andi Kleen 2023-06-22 13:41 ` Ian Rogers @ 2023-06-22 19:09 ` Namhyung Kim 2023-06-22 20:15 ` Andi Kleen 1 sibling, 1 reply; 15+ messages in thread From: Namhyung Kim @ 2023-06-22 19:09 UTC (permalink / raw) To: Andi Kleen; +Cc: Ian Rogers, linux-perf-users, acme Hi Andi, Sorry about the breakage. On Thu, Jun 22, 2023 at 2:20 AM Andi Kleen <ak@linux.intel.com> wrote: > > > Can you provide an example that is breaking? > > I only have a very long auto generated example[1], because it's not > easy to make a non scheduling event and it's also dependent on the system. The example > works on a SKX. > > > > > Wrt event order, cases like: > > ``` > > $ sudo perf stat -e '{data_read,data_write}' -a --no-merge sleep 1 > > > > Performance counter stats for 'system wide': > > > > 1,960.81 MiB data_read [uncore_imc_free_running_1] > > 438.48 MiB data_write [uncore_imc_free_running_1] > > 1,961.85 MiB data_read [uncore_imc_free_running_0] > > 438.79 MiB data_write [uncore_imc_free_running_0] > > > > 1.001127356 seconds time elapsed > > ``` > > Reorder events so that the PMUs match (ie the -e order would 2x > > data_read followed by 2x data_write). This covers more cases now so > > that perf metric (aka topdown) events aren't broken when coming out of > > metrics: > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2139 > > How is any tool supposed to handle arbitary output reordering like > that? Perf is not just for humans, but also for tools. Is the problem of reordering the output or hiding something? Do you mean both? > > It cannot just look up the events by name because there might be multiple > instances of the same event in different groups. > > The only reliable way to match is to match on the original order, > but that totally breaks down with these changes. > > The same actually applies to humans reading the output too, > if it's ambigious for the tool it will be ambigious to the human too. > perf stat shouldn't reorder events ever. > > > > > Given the original commit message there seems little point to give the > > 0 counts, but perhaps it is something that could be done on the more > > tool oriented CSV and JSON output formats. > > It's not 0 counts, it's <not supported>. But even 0 counts can > be important. > > We should always output when something doesn't count so that > the user knows something is wrong. Hiding problems is always a bad idea > and not an improvement. Right, I wanted to skip (or hide) events that are not reasonable like with redundant or invalid setups. It should display those error results. Didn't you say you only have problems in uncore events? > > > -Andi > > [1] Long example for SKX: > > perf stat -o p3 -x\; It seems you don't use any special aggregation option which means the global aggr mode is used. Then I think should_skip_zero_counter() would return false because aggr_cpu_id__equal() is true and the result should be printed. Hmm... -e '{cpu/event=0xc0,umask=0x0/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x0,umask=0x3/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0xb1,umask=0x2,cmask=1/u},msr/tsc/,duration_time,dummy,uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cs:u,minor-faults:u,major-faults:u,migrations:u,{cycles:u,cpu/event=0x0,umask=0x3/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0xc4,umask=0x40/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xc2,umask=0x2,cmask=1/u,cpu/event=0x3c,umask=0x0/ku,cpu/event=0xc0,umask=0x0/ku},{cpu/event=0x79,umask=0x8/u,cpu/event=0xa8,umask=0x1/u,cpu/event=0x79,umask=0x4/u,cpu/event=0x79,umask=0x30/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u},{cpu/event=0x11,umask=0x2/u,cpu/event=0x11,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x3c,umask=0x0/ku,cpu/event=0x3c,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xc2,umask=0x2/u},{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0x9c,umask=0x1/u,cpu/event=0xc0,umask=0x0/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xd,umask=0x3,cmask=1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cpu/event=0x79,umask=0x30/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xc5,umask=0x0/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0x3c,umask=0x1/u},dummy,{cpu/event=0xe,umask=0x1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},dummy,{cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u},{cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x9c,umask=0x1,cmask=4/u},{cpu/event=0x9c,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0xe,umask=0x1/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xd,umask=0x3,cmask=1/u},dummy,dummy,emulation-faults,dummy,{cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u},{cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u,cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x9c,umask=0x1,cmask=4/u},{cpu/event=0x79,umask=0x30,edge=1,cmask=1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x87,umask=0x1/u,cpu/event=0xab,umask=0x2/u,cpu/event=0xa2,umask=0x8/u},{cpu/event=0x85,umask=0x10/u,cpu/event=0x85,umask=0x4/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x8,umask=0x10/u,cpu/event=0x8,umask=0x4/u},{cpu/event=0xc5,umask=0x0/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xe6,umask=0x1f/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0xd1,umask=0x4/u,cpu/event=0xd1,umask=0x20/u,cpu/event=0xa3,umask=0x5,cmask=5/u,cpu/event=0x3c,umask=0x0/u},{cpu/event=0x14,umask=0x1/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u,cpu/event=0x11,umask=0x2/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cpu/event=0x79,umask=0x30/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x3c,umask=0x2/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x10,umask=0x1/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0x10,umask=0x20/u},{cpu/event=0x10,umask=0x80/u,cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x11,umask=0x1/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x3c,umask=0x2/u,cpu/event=0x3c,umask=0x1/u},dummy,{cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0xb1,umask=0x1,cmask=1/u,cpu/event=0xb1,umask=0x1,cmask=3/u,cpu/event=0xb1,umask=0x1,cmask=2/u},{cpu/event=0xc0,umask=0x0/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0x3c,umask=0x0/u,cpu/event=0x60,umask=0x8,cmask=6/u},{cpu/event=0x3c,umask=0x0/u,cpu/event=0x60,umask=0x8,cmask=1/u,cpu/event=0x60,umask=0x8,cmask=6/u},{cpu/event=0xc2,umask=0x2/u,cpu/event=0x10,umask=0x1/u,cpu/event=0xb1,umask=0x1/u},{cpu/event=0x10,umask=0x20/u,cpu/event=0x10,umask=0x80/u,cpu/event=0xb1,umask=0x1/u,cpu/event=0x10,umask=0x10/u},emulation-faults,{cpu/event=0x10,umask=0x10/u,cpu/event=0x10,umask=0x40/u,cpu/event=0x11,umask=0x1/u,cpu/event=0x11,umask=0x2/u},emulation-faults,{cpu/event=0x11,umask=0x2/u,cpu/event=0x11,umask=0x1/u,cpu/event=0xb1,umask=0x1/u}' ~/pmu/pmu-tools/workloads/BC1s > grep uncore p3 It's a very long list of events but seems mostly core events. Can we have a smallest reproducer only with uncore events? Thanks, Namhyung ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-22 19:09 ` Namhyung Kim @ 2023-06-22 20:15 ` Andi Kleen 2023-06-23 0:08 ` Namhyung Kim 0 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2023-06-22 20:15 UTC (permalink / raw) To: Namhyung Kim; +Cc: Ian Rogers, linux-perf-users, acme On Thu, Jun 22, 2023 at 12:09:14PM -0700, Namhyung Kim wrote: > Hi Andi, > > Sorry about the breakage. > > On Thu, Jun 22, 2023 at 2:20 AM Andi Kleen <ak@linux.intel.com> wrote: > > > > > Can you provide an example that is breaking? > > > > I only have a very long auto generated example[1], because it's not > > easy to make a non scheduling event and it's also dependent on the system. The example > > works on a SKX. > > > > > > > > Wrt event order, cases like: > > > ``` > > > $ sudo perf stat -e '{data_read,data_write}' -a --no-merge sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > 1,960.81 MiB data_read [uncore_imc_free_running_1] > > > 438.48 MiB data_write [uncore_imc_free_running_1] > > > 1,961.85 MiB data_read [uncore_imc_free_running_0] > > > 438.79 MiB data_write [uncore_imc_free_running_0] > > > > > > 1.001127356 seconds time elapsed > > > ``` > > > Reorder events so that the PMUs match (ie the -e order would 2x > > > data_read followed by 2x data_write). This covers more cases now so > > > that perf metric (aka topdown) events aren't broken when coming out of > > > metrics: > > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2139 > > > > How is any tool supposed to handle arbitary output reordering like > > that? Perf is not just for humans, but also for tools. > > Is the problem of reordering the output or hiding something? > Do you mean both? For my tool it's both. In this particular case it's the hiding that broke it If the reordering only happens on broken groups etc. as Ian wrote that's ok for me because I can avoid generating them. But I can't tolerate hiding. > > > > > > > > Given the original commit message there seems little point to give the > > > 0 counts, but perhaps it is something that could be done on the more > > > tool oriented CSV and JSON output formats. > > > > It's not 0 counts, it's <not supported>. But even 0 counts can > > be important. > > > > We should always output when something doesn't count so that > > the user knows something is wrong. Hiding problems is always a bad idea > > and not an improvement. > > Right, I wanted to skip (or hide) events that are not reasonable > like with redundant or invalid setups. It should display those > error results. Okay so the skipping can just be removed? > > Didn't you say you only have problems in uncore events? For this test case I only hit it with the uncore, but I would have a general problem for any event if it was hidden. Thanks, -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-22 20:15 ` Andi Kleen @ 2023-06-23 0:08 ` Namhyung Kim 2023-06-23 17:04 ` Andi Kleen 2023-06-23 18:35 ` Andi Kleen 0 siblings, 2 replies; 15+ messages in thread From: Namhyung Kim @ 2023-06-23 0:08 UTC (permalink / raw) To: Andi Kleen; +Cc: Ian Rogers, linux-perf-users, acme On Thu, Jun 22, 2023 at 1:15 PM Andi Kleen <ak@linux.intel.com> wrote: > > On Thu, Jun 22, 2023 at 12:09:14PM -0700, Namhyung Kim wrote: > > Hi Andi, > > > > Sorry about the breakage. > > > > On Thu, Jun 22, 2023 at 2:20 AM Andi Kleen <ak@linux.intel.com> wrote: > > > > > > > Can you provide an example that is breaking? > > > > > > I only have a very long auto generated example[1], because it's not > > > easy to make a non scheduling event and it's also dependent on the system. The example > > > works on a SKX. > > > > > > > > > > > Wrt event order, cases like: > > > > ``` > > > > $ sudo perf stat -e '{data_read,data_write}' -a --no-merge sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > 1,960.81 MiB data_read [uncore_imc_free_running_1] > > > > 438.48 MiB data_write [uncore_imc_free_running_1] > > > > 1,961.85 MiB data_read [uncore_imc_free_running_0] > > > > 438.79 MiB data_write [uncore_imc_free_running_0] > > > > > > > > 1.001127356 seconds time elapsed > > > > ``` > > > > Reorder events so that the PMUs match (ie the -e order would 2x > > > > data_read followed by 2x data_write). This covers more cases now so > > > > that perf metric (aka topdown) events aren't broken when coming out of > > > > metrics: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2139 > > > > > > How is any tool supposed to handle arbitary output reordering like > > > that? Perf is not just for humans, but also for tools. > > > > Is the problem of reordering the output or hiding something? > > Do you mean both? > > For my tool it's both. In this particular case it's the hiding > that broke it > > If the reordering only happens on broken groups etc. as Ian wrote that's ok for me > because I can avoid generating them. But I can't tolerate hiding. I see. I'm not aware of the reordering. But I agree that hiding events can be a problem. > > > > > > > > > > > > Given the original commit message there seems little point to give the > > > > 0 counts, but perhaps it is something that could be done on the more > > > > tool oriented CSV and JSON output formats. > > > > > > It's not 0 counts, it's <not supported>. But even 0 counts can > > > be important. > > > > > > We should always output when something doesn't count so that > > > the user knows something is wrong. Hiding problems is always a bad idea > > > and not an improvement. > > > > Right, I wanted to skip (or hide) events that are not reasonable > > like with redundant or invalid setups. It should display those > > error results. > > Okay so the skipping can just be removed? The existing use case was system-wide --per-thread mode. It would generate too much noise without it. > > > > > Didn't you say you only have problems in uncore events? > > For this test case I only hit it with the uncore, but I would have a general > problem for any event if it was hidden. stat-display.c::should_skip_zero_counter() specifically checks uncore PMU events, so I don't think core events have problems. Also it only skips when aggregation id doesn't match, which means the (default) global aggregation mode should not skip any uncore events. Can you please help me to find a minimal repro? Thanks, Namhyung ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-23 0:08 ` Namhyung Kim @ 2023-06-23 17:04 ` Andi Kleen 2023-06-23 18:35 ` Andi Kleen 1 sibling, 0 replies; 15+ messages in thread From: Andi Kleen @ 2023-06-23 17:04 UTC (permalink / raw) To: Namhyung Kim; +Cc: Ian Rogers, linux-perf-users, acme > > Okay so the skipping can just be removed? > > The existing use case was system-wide --per-thread mode. > It would generate too much noise without it. Hmm I guess we could make it default for per thread mode, but add an option to turn it on again if it's needed (my tool currently doesn't do per thread, but I can could imagine adding that at some point) > > > Didn't you say you only have problems in uncore events? > > > > For this test case I only hit it with the uncore, but I would have a general > > problem for any event if it was hidden. > > stat-display.c::should_skip_zero_counter() specifically checks > uncore PMU events, so I don't think core events have problems. > > Also it only skips when aggregation id doesn't match, which means > the (default) global aggregation mode should not skip any uncore > events. > > Can you please help me to find a minimal repro? I'll run delta[1] over it later today. But you may need a SKX. -Andi [1] https://github.com/dsw/delta ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-23 0:08 ` Namhyung Kim 2023-06-23 17:04 ` Andi Kleen @ 2023-06-23 18:35 ` Andi Kleen 2023-06-23 18:53 ` Ian Rogers 1 sibling, 1 reply; 15+ messages in thread From: Andi Kleen @ 2023-06-23 18:35 UTC (permalink / raw) To: Namhyung Kim; +Cc: Ian Rogers, linux-perf-users, acme > Can you please help me to find a minimal repro? Here's a smaller test case (works on SKX): perf stat -a -e uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles <something the doesn't idle for 1s> -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-23 18:35 ` Andi Kleen @ 2023-06-23 18:53 ` Ian Rogers 2023-06-23 23:04 ` Namhyung Kim 0 siblings, 1 reply; 15+ messages in thread From: Ian Rogers @ 2023-06-23 18:53 UTC (permalink / raw) To: Andi Kleen; +Cc: Namhyung Kim, linux-perf-users, acme On Fri, Jun 23, 2023 at 11:35 AM Andi Kleen <ak@linux.intel.com> wrote: > > > Can you please help me to find a minimal repro? > > Here's a smaller test case (works on SKX): > > perf stat -a -e uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles <something the doesn't idle for 1s> > Hmm.. not sure I'm seeing what you are seeing: Perf 6.1: ``` $ perf stat -a -e uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 7.915 [sec] 7.915043 usecs/op 126341 ops/sec Performance counter stats for 'system wide': 24,396,429 uncore_imc/event=0x4,umask=0x3/ 17,621,319 uncore_imc/event=0x4,umask=0xc/ 74,773,564,295 cycles 7.937755581 seconds time elapsed ``` perf-tools-next: ``` $ perf stat -a -e uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 7.677 [sec] 7.677456 usecs/op 130251 ops/sec Performance counter stats for 'system wide': 2,315,325 uncore_imc/event=0x4,umask=0x3/ 2,316,490 uncore_imc/event=0x4,umask=0x3/ 2,452,872 uncore_imc/event=0x4,umask=0x3/ 2,280,681 uncore_imc/event=0x4,umask=0x3/ 2,479,263 uncore_imc/event=0x4,umask=0x3/ 2,477,248 uncore_imc/event=0x4,umask=0x3/ 1,268,114 uncore_imc/event=0x4,umask=0xc/ 1,267,928 uncore_imc/event=0x4,umask=0xc/ 1,323,996 uncore_imc/event=0x4,umask=0xc/ 1,229,923 uncore_imc/event=0x4,umask=0xc/ 1,352,914 uncore_imc/event=0x4,umask=0xc/ 1,352,125 uncore_imc/event=0x4,umask=0xc/ 58,873,911,111 cycles 7.687930565 seconds time elapsed ``` So with perf-tools-next it appears that we're not merging the uncore events, but that's different than skipping events. Thanks, Ian > -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-23 18:53 ` Ian Rogers @ 2023-06-23 23:04 ` Namhyung Kim 2023-06-24 17:59 ` Andi Kleen 0 siblings, 1 reply; 15+ messages in thread From: Namhyung Kim @ 2023-06-23 23:04 UTC (permalink / raw) To: Ian Rogers; +Cc: Andi Kleen, linux-perf-users, acme On Fri, Jun 23, 2023 at 11:53 AM Ian Rogers <irogers@google.com> wrote: > > On Fri, Jun 23, 2023 at 11:35 AM Andi Kleen <ak@linux.intel.com> wrote: > > > > > Can you please help me to find a minimal repro? > > > > Here's a smaller test case (works on SKX): > > > > perf stat -a -e uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles <something the doesn't idle for 1s> > > > > Hmm.. not sure I'm seeing what you are seeing: > > Perf 6.1: > ``` > $ perf stat -a -e > uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles > perf bench sched pipe > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 7.915 [sec] > > 7.915043 usecs/op > 126341 ops/sec > > Performance counter stats for 'system wide': > > 24,396,429 uncore_imc/event=0x4,umask=0x3/ > 17,621,319 uncore_imc/event=0x4,umask=0xc/ > 74,773,564,295 cycles > > 7.937755581 seconds time elapsed > ``` > > perf-tools-next: > ``` > $ perf stat -a -e > uncore_imc/event=0x4,umask=0x3/,uncore_imc/event=0x4,umask=0xc/,cycles > perf bench sched pipe > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 7.677 [sec] > > 7.677456 usecs/op > 130251 ops/sec > > Performance counter stats for 'system wide': > > 2,315,325 uncore_imc/event=0x4,umask=0x3/ > 2,316,490 uncore_imc/event=0x4,umask=0x3/ > 2,452,872 uncore_imc/event=0x4,umask=0x3/ > 2,280,681 uncore_imc/event=0x4,umask=0x3/ > 2,479,263 uncore_imc/event=0x4,umask=0x3/ > 2,477,248 uncore_imc/event=0x4,umask=0x3/ > 1,268,114 uncore_imc/event=0x4,umask=0xc/ > 1,267,928 uncore_imc/event=0x4,umask=0xc/ > 1,323,996 uncore_imc/event=0x4,umask=0xc/ > 1,229,923 uncore_imc/event=0x4,umask=0xc/ > 1,352,914 uncore_imc/event=0x4,umask=0xc/ > 1,352,125 uncore_imc/event=0x4,umask=0xc/ > 58,873,911,111 cycles > > 7.687930565 seconds time elapsed > ``` > > So with perf-tools-next it appears that we're not merging the uncore > events, but that's different than skipping events. Right, not sure what's changed though. Andi, what's your expectation? Thanks, Namhyung ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-23 23:04 ` Namhyung Kim @ 2023-06-24 17:59 ` Andi Kleen 2023-06-26 5:33 ` Ian Rogers 2023-06-26 23:09 ` Namhyung Kim 0 siblings, 2 replies; 15+ messages in thread From: Andi Kleen @ 2023-06-24 17:59 UTC (permalink / raw) To: Namhyung Kim; +Cc: Ian Rogers, linux-perf-users, acme > > So with perf-tools-next it appears that we're not merging the uncore > > events, but that's different than skipping events. Isn't that a regression too? We're supposed to merge. > > Right, not sure what's changed though. > > Andi, what's your expectation? The original test case passes with perf-next, but I still see failures, but that might be related to PMU reordering. I'll investigate that. However if the original hiding is still in the code the problem could still happen, right? I would still prefer to do what i suggested earlier, to only do hiding on --per-thread with an option to turn off. -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-24 17:59 ` Andi Kleen @ 2023-06-26 5:33 ` Ian Rogers 2023-06-26 23:09 ` Namhyung Kim 1 sibling, 0 replies; 15+ messages in thread From: Ian Rogers @ 2023-06-26 5:33 UTC (permalink / raw) To: Andi Kleen; +Cc: Namhyung Kim, linux-perf-users, acme On Sat, Jun 24, 2023 at 10:59 AM Andi Kleen <ak@linux.intel.com> wrote: > > > > So with perf-tools-next it appears that we're not merging the uncore > > > events, but that's different than skipping events. > > Isn't that a regression too? We're supposed to merge. Right, bisected and sent a fix: https://lore.kernel.org/lkml/20230626053048.257959-1-irogers@google.com/ Thanks, Ian > > > > Right, not sure what's changed though. > > > > Andi, what's your expectation? > > The original test case passes with perf-next, but I still see failures, > but that might be related to PMU reordering. I'll investigate that. > > However if the original hiding is still in the code the problem could still > happen, right? I would still prefer to do what i suggested earlier, > to only do hiding on --per-thread with an option to turn off. > > -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-24 17:59 ` Andi Kleen 2023-06-26 5:33 ` Ian Rogers @ 2023-06-26 23:09 ` Namhyung Kim 2023-06-26 23:52 ` Andi Kleen 1 sibling, 1 reply; 15+ messages in thread From: Namhyung Kim @ 2023-06-26 23:09 UTC (permalink / raw) To: Andi Kleen; +Cc: Ian Rogers, linux-perf-users, acme On Sat, Jun 24, 2023 at 10:59 AM Andi Kleen <ak@linux.intel.com> wrote: > > > > So with perf-tools-next it appears that we're not merging the uncore > > > events, but that's different than skipping events. > > Isn't that a regression too? We're supposed to merge. Right, should be fixed by Ian's patch. > > > > > Right, not sure what's changed though. > > > > Andi, what's your expectation? > > The original test case passes with perf-next, but I still see failures, > but that might be related to PMU reordering. I'll investigate that. > > However if the original hiding is still in the code the problem could still > happen, right? I would still prefer to do what i suggested earlier, > to only do hiding on --per-thread with an option to turn off. I think the hiding should not happen in normal conditions besides the --per-thread. So I want to be clear what makes it disappears. Then we can decide what to do. Thanks, Namhyung ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Perf stat regression from d15480a3d67 2023-06-26 23:09 ` Namhyung Kim @ 2023-06-26 23:52 ` Andi Kleen 0 siblings, 0 replies; 15+ messages in thread From: Andi Kleen @ 2023-06-26 23:52 UTC (permalink / raw) To: Namhyung Kim; +Cc: Ian Rogers, linux-perf-users, acme > > > > > > Right, not sure what's changed though. > > > > > > Andi, what's your expectation? > > > > The original test case passes with perf-next, but I still see failures, > > but that might be related to PMU reordering. I'll investigate that. > > > > However if the original hiding is still in the code the problem could still > > happen, right? I would still prefer to do what i suggested earlier, > > to only do hiding on --per-thread with an option to turn off. > > I think the hiding should not happen in normal conditions > besides the --per-thread. So I want to be clear what makes > it disappears. Then we can decide what to do. Okay I'll watch it and see if it happens again for next. However if it's broken in some stable might also need to double check that. -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-06-26 23:52 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-22 1:39 Perf stat regression from d15480a3d67 Andi Kleen 2023-06-22 6:10 ` Ian Rogers 2023-06-22 9:20 ` Andi Kleen 2023-06-22 13:41 ` Ian Rogers 2023-06-22 19:09 ` Namhyung Kim 2023-06-22 20:15 ` Andi Kleen 2023-06-23 0:08 ` Namhyung Kim 2023-06-23 17:04 ` Andi Kleen 2023-06-23 18:35 ` Andi Kleen 2023-06-23 18:53 ` Ian Rogers 2023-06-23 23:04 ` Namhyung Kim 2023-06-24 17:59 ` Andi Kleen 2023-06-26 5:33 ` Ian Rogers 2023-06-26 23:09 ` Namhyung Kim 2023-06-26 23:52 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).