perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
@ 2024-07-19  9:50 Michael Petlan
       [not found] ` <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>
  2024-07-20  0:30 ` Namhyung Kim
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Petlan @ 2024-07-19  9:50 UTC (permalink / raw)
  To: namhyung; +Cc: acme, vmolnaro, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

Hello Namhyung,

we were investigating some test failures of the testcase mentioned
in $subj. We have narrowed it down to:

    # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop

    Performance counter stats for 'CPU(s) 0,1':
         <not counted>      cycles                           system.slice
         3,020,401,084      cycles                           user.slice                       

         1.009787097 seconds time elapsed

As seen, the system.slice is not counted properly in our case. It
happens even without bpf-counters being involved.

There were rumours that it might be caused due to too small system
load, but it apparently happens even when the load was replaced by
"thloop" workload from perf-test's workload library. However, even
so, if the load was insufficient, we'd see a value – 0 instead of
"not counted". The "<not counted>" result is printed if the counter
wasn't properly enabled and running.

Have you encountered this problem? What could cause it?

Thanks.
Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>]

* Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
       [not found] ` <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>
@ 2024-07-19 11:05   ` Michael Petlan
  2024-11-01 10:15     ` Michael Petlan
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Petlan @ 2024-07-19 11:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Arnaldo de Melo, vmolnaro, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 5022 bytes --]

On Fri, 19 Jul 2024, Arnaldo Carvalho de Melo wrote:
> On Fri, Jul 19, 2024, 6:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
>       Hello Namhyung,
> 
>       we were investigating some test failures of the testcase mentioned
>       in $subj. We have narrowed it down to:
> 
>           # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
> 
>           Performance counter stats for 'CPU(s) 0,1':
>                <not counted>      cycles                           system.slice
>                3,020,401,084      cycles                           user.slice                       
> 
>                1.009787097 seconds time elapsed
> 
>       As seen, the system.slice is not counted properly in our case. It
>       happens even without bpf-counters being involved.
> 
>       There were rumours that it might be caused due to too small system
>       load, but it apparently happens even when the load was replaced by
>       "thloop" workload from perf-test's workload library. However, even
>       so, if the load was insufficient, we'd see a value – 0 instead of
>       "not counted". The "<not counted>" result is printed if the counter
>       wasn't properly enabled and running.
> 
>       Have you encountered this problem? What could cause it?
> 
> 
> What does running with -vvv says? Some inconclusive error coming from the kernel? 

Nothing obvious:

# perf stat -vvv -C 0,1 --for-each-cgroup system.slice,user.slice -e cpu-clock taskset -c 0 perf test -w thloop
Using CPUID GenuineIntel-6-6A-6
Control descriptor is not initialized
Opening: cpu-clock 
------------------------------------------------------------
perf_event_attr:   
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0 (PERF_COUNT_SW_CPU_CLOCK)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 3  cpu 0  group_fd -1  flags 0xc = 5
Opening: cpu-clock 
------------------------------------------------------------
perf_event_attr:   
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0 (PERF_COUNT_SW_CPU_CLOCK)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 4  cpu 0  group_fd -1  flags 0xc = 6
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0 (PERF_COUNT_SW_CPU_CLOCK)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 3  cpu 1  group_fd -1  flags 0xc = 7
Opening: cpu-clock
------------------------------------------------------------
perf_event_attr:   
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0 (PERF_COUNT_SW_CPU_CLOCK)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 4  cpu 1  group_fd -1  flags 0xc = 9
cpu-clock: 0: 0 0 0
cpu-clock: 0: 1004758163 1004761145 1004761145
cpu-clock: 1: 0 0 0
cpu-clock: 1: 60896 62271 62271
cpu-clock: 0 0 0   
cpu-clock: 1004819059 1004823416 1004823416

 Performance counter stats for 'CPU(s) 0,1':

     <not counted> msec cpu-clock                        system.slice
          1,004.82 msec cpu-clock                        user.slice       #    0.999 CPUs utilized

       1.005824026 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
        echo 0 > /proc/sys/kernel/nmi_watchdog
        perf stat ...
        echo 1 > /proc/sys/kernel/nmi_watchdog

....

The nmi_watchdog message is irrelevant, it does not work no matter what is set there.

> Maybe retsnoop can narrow it down? 

Will try. Thanks.
> 
> https://github.com/anakryiko/retsnoop
> 
> - Arnaldo 

Michael
> 
> 
> 
>       Thanks.
>       Michael
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
  2024-07-19 11:05   ` Michael Petlan
@ 2024-11-01 10:15     ` Michael Petlan
  2024-11-04 19:52       ` Namhyung Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Petlan @ 2024-11-01 10:15 UTC (permalink / raw)
  To: linux-perf-users
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Arnaldo de Melo, vmolnaro

[-- Attachment #1: Type: text/plain, Size: 4229 bytes --]

On Fri, 19 Jul 2024, Michael Petlan wrote:
> On Fri, 19 Jul 2024, Arnaldo Carvalho de Melo wrote:
> > On Fri, Jul 19, 2024, 6:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
> >       Hello Namhyung,
> > 
> >       we were investigating some test failures of the testcase mentioned
> >       in $subj. We have narrowed it down to:
> > 
> >           # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
> > 
> >           Performance counter stats for 'CPU(s) 0,1':
> >                <not counted>      cycles                           system.slice
> >                3,020,401,084      cycles                           user.slice                       
> > 
> >                1.009787097 seconds time elapsed
> > 
> >       As seen, the system.slice is not counted properly in our case. It
> >       happens even without bpf-counters being involved.
> > 
> >       There were rumours that it might be caused due to too small system
> >       load, but it apparently happens even when the load was replaced by
> >       "thloop" workload from perf-test's workload library. However, even
> >       so, if the load was insufficient, we'd see a value – 0 instead of
> >       "not counted". The "<not counted>" result is printed if the counter
> >       wasn't properly enabled and running.
> > 
> >       Have you encountered this problem? What could cause it?
> > 
> > 
> > What does running with -vvv says? Some inconclusive error coming from the kernel? 
> 
Hello!

We have been investigating this issue a bit more again and we have come
to conclusion that everything is probably OK, except of the testcase
which in short relies on the fact that taskset can force any system.slice
workload to happen on a particular CPU, which in my opinion does not
apply, being rather random and that's why the test sometimes fails.

To summarize the problem a bit:

1) The $subj testcase sometimes fails.

2) It consists of two parts, one performs counting system-wide and the
second limits the counting to CPUs 0 and 1. The second one sometimes
fails, while the first (systemwide) passes always.

3) The reason why the test fails is because system.slice may get
<not counted> result.

4) There is another problem with this testcase on single-cpu boxes, since
there is no "cpu 1", so we decided to try having "-C 0" and "taskset -c 0"
on such boxes. The problems with getting "<not counted>" disappeared!

---------------------------

So... The systemwide tracing test works:

# perf stat --for-each-cgroup system.slice,user.slice -e cycles -a -- sleep 3
 Performance counter stats for 'system wide':

     8,884,593      cycles            system.slice
     5,645,624      cycles            user.slice

   3.004137451 seconds time elapsed

When we pin the workload AND tracing to particular CPU, it might fail:

# perf stat -C 0 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 0 true

 Performance counter stats for 'CPU(s) 0':

     <not counted>  cycles            system.slice
     2,722,263      cycles            user.slice

   0.004184686 seconds time elapsed

Namhyung said that there might be not enough load, which finally appears
to be the problem. But not in the manner that replacing `true` by something
more "heavy" would help, but in the fact that system.slice didn't run on
CPU 0 at all during the `perf stat` counting.

Taskset can pin the process to some CPU, but even without it (or when we
pin it to CPU 3 for example), _some_ user.slice content is always run on
cpu 0, so we get values.

However, there is no guarantee that system.slice will run there. We would
probably need to load more the content that systemd decides to put under
"system.slice" and hope that it will get a chance to run on CPU 0 or 1 or
whatever we use in the testcase.

Of course, the more CPUs the machine has, the higher chance to get the
<not counted> result for system.slice is. That's why -a works and also
that's why it works on a single-CPU machine.

............

Thus, I think that we should simply remove the taskset part of the testcase
and leave only the systemwide part.

Thoughts?

Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
  2024-11-01 10:15     ` Michael Petlan
@ 2024-11-04 19:52       ` Namhyung Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2024-11-04 19:52 UTC (permalink / raw)
  To: Michael Petlan
  Cc: linux-perf-users, Arnaldo Carvalho de Melo, Arnaldo de Melo,
	vmolnaro

Hello Michael,

On Fri, Nov 01, 2024 at 11:15:39AM +0100, Michael Petlan wrote:
> 
> On Fri, 19 Jul 2024, Michael Petlan wrote:
> > On Fri, 19 Jul 2024, Arnaldo Carvalho de Melo wrote:
> > > On Fri, Jul 19, 2024, 6:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
> > >       Hello Namhyung,
> > > 
> > >       we were investigating some test failures of the testcase mentioned
> > >       in $subj. We have narrowed it down to:
> > > 
> > >           # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
> > > 
> > >           Performance counter stats for 'CPU(s) 0,1':
> > >                <not counted>      cycles                           system.slice
> > >                3,020,401,084      cycles                           user.slice                       
> > > 
> > >                1.009787097 seconds time elapsed
> > > 
> > >       As seen, the system.slice is not counted properly in our case. It
> > >       happens even without bpf-counters being involved.
> > > 
> > >       There were rumours that it might be caused due to too small system
> > >       load, but it apparently happens even when the load was replaced by
> > >       "thloop" workload from perf-test's workload library. However, even
> > >       so, if the load was insufficient, we'd see a value – 0 instead of
> > >       "not counted". The "<not counted>" result is printed if the counter
> > >       wasn't properly enabled and running.
> > > 
> > >       Have you encountered this problem? What could cause it?
> > > 
> > > 
> > > What does running with -vvv says? Some inconclusive error coming from the kernel? 
> > 
> Hello!
> 
> We have been investigating this issue a bit more again and we have come
> to conclusion that everything is probably OK, except of the testcase
> which in short relies on the fact that taskset can force any system.slice
> workload to happen on a particular CPU, which in my opinion does not
> apply, being rather random and that's why the test sometimes fails.
> 
> To summarize the problem a bit:
> 
> 1) The $subj testcase sometimes fails.
> 
> 2) It consists of two parts, one performs counting system-wide and the
> second limits the counting to CPUs 0 and 1. The second one sometimes
> fails, while the first (systemwide) passes always.
> 
> 3) The reason why the test fails is because system.slice may get
> <not counted> result.
> 
> 4) There is another problem with this testcase on single-cpu boxes, since
> there is no "cpu 1", so we decided to try having "-C 0" and "taskset -c 0"
> on such boxes. The problems with getting "<not counted>" disappeared!
> 
> ---------------------------
> 
> So... The systemwide tracing test works:
> 
> # perf stat --for-each-cgroup system.slice,user.slice -e cycles -a -- sleep 3
>  Performance counter stats for 'system wide':
> 
>      8,884,593      cycles            system.slice
>      5,645,624      cycles            user.slice
> 
>    3.004137451 seconds time elapsed
> 
> When we pin the workload AND tracing to particular CPU, it might fail:
> 
> # perf stat -C 0 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 0 true
> 
>  Performance counter stats for 'CPU(s) 0':
> 
>      <not counted>  cycles            system.slice
>      2,722,263      cycles            user.slice
> 
>    0.004184686 seconds time elapsed
> 
> Namhyung said that there might be not enough load, which finally appears
> to be the problem. But not in the manner that replacing `true` by something
> more "heavy" would help, but in the fact that system.slice didn't run on
> CPU 0 at all during the `perf stat` counting.
> 
> Taskset can pin the process to some CPU, but even without it (or when we
> pin it to CPU 3 for example), _some_ user.slice content is always run on
> cpu 0, so we get values.
> 
> However, there is no guarantee that system.slice will run there. We would
> probably need to load more the content that systemd decides to put under
> "system.slice" and hope that it will get a chance to run on CPU 0 or 1 or
> whatever we use in the testcase.
> 
> Of course, the more CPUs the machine has, the higher chance to get the
> <not counted> result for system.slice is. That's why -a works and also
> that's why it works on a single-CPU machine.
> 
> ............
> 
> Thus, I think that we should simply remove the taskset part of the testcase
> and leave only the systemwide part.
> 
> Thoughts?

Thanks for taking a look at this.  I admit that the CPU list can make
troubles.  We need a better cgroup testing infra.  Anyway let's get rid
of the problematic test for now.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
  2024-07-19  9:50 perf test fail :: "perf stat --bpf-counters --for-each-cgroup test" Michael Petlan
       [not found] ` <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>
@ 2024-07-20  0:30 ` Namhyung Kim
  2024-07-23  9:36   ` Michael Petlan
  1 sibling, 1 reply; 6+ messages in thread
From: Namhyung Kim @ 2024-07-20  0:30 UTC (permalink / raw)
  To: Michael Petlan; +Cc: acme, vmolnaro, linux-perf-users

Hello,

On Fri, Jul 19, 2024 at 2:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
>
> Hello Namhyung,
>
> we were investigating some test failures of the testcase mentioned
> in $subj. We have narrowed it down to:
>
>     # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
>
>     Performance counter stats for 'CPU(s) 0,1':
>          <not counted>      cycles                           system.slice
>          3,020,401,084      cycles                           user.slice
>
>          1.009787097 seconds time elapsed
>
> As seen, the system.slice is not counted properly in our case. It
> happens even without bpf-counters being involved.
>
> There were rumours that it might be caused due to too small system
> load, but it apparently happens even when the load was replaced by
> "thloop" workload from perf-test's workload library. However, even
> so, if the load was insufficient, we'd see a value – 0 instead of
> "not counted". The "<not counted>" result is printed if the counter
> wasn't properly enabled and running.
>
> Have you encountered this problem? What could cause it?

Sometimes, it depends on the system activity.
I assumed there would be some system operation running
on the given CPUs at any time.  If not, it can fail..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
  2024-07-20  0:30 ` Namhyung Kim
@ 2024-07-23  9:36   ` Michael Petlan
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Petlan @ 2024-07-23  9:36 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: acme, vmolnaro, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]

On Fri, 19 Jul 2024, Namhyung Kim wrote:
> Hello,
> 
> On Fri, Jul 19, 2024 at 2:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
> >
> > Hello Namhyung,
> >
> > we were investigating some test failures of the testcase mentioned
> > in $subj. We have narrowed it down to:
> >
> >     # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
> >
> >     Performance counter stats for 'CPU(s) 0,1':
> >          <not counted>      cycles                           system.slice
> >          3,020,401,084      cycles                           user.slice
> >
> >          1.009787097 seconds time elapsed
> >
> > As seen, the system.slice is not counted properly in our case. It
> > happens even without bpf-counters being involved.
> >
> > There were rumours that it might be caused due to too small system
> > load, but it apparently happens even when the load was replaced by
> > "thloop" workload from perf-test's workload library. However, even
> > so, if the load was insufficient, we'd see a value – 0 instead of
> > "not counted". The "<not counted>" result is printed if the counter
> > wasn't properly enabled and running.
> >
> > Have you encountered this problem? What could cause it?
> 
> Sometimes, it depends on the system activity.
> I assumed there would be some system operation running
> on the given CPUs at any time.  If not, it can fail..

Thanks for response. However, in such case, I'd expect "0" value instead
of "<not counted>". As far as I understand, "<not supported>" occurs when
an event cannot be even opened, "<not counted>" appears when the event
can be opened, but for example there aren't enough free hardware counters,
so it does not run any counting actually. But for cases when there is not
enough load, we get simply low numerical values, or 0, don't we...

Michael
> 
> Thanks,
> Namhyung
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-11-04 19:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-19  9:50 perf test fail :: "perf stat --bpf-counters --for-each-cgroup test" Michael Petlan
     [not found] ` <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>
2024-07-19 11:05   ` Michael Petlan
2024-11-01 10:15     ` Michael Petlan
2024-11-04 19:52       ` Namhyung Kim
2024-07-20  0:30 ` Namhyung Kim
2024-07-23  9:36   ` Michael Petlan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).