Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: Michael Petlan <mpetlan@redhat.com>
Cc: linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	Arnaldo de Melo <acme@redhat.com>,
	vmolnaro@redhat.com
Subject: Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
Date: Mon, 4 Nov 2024 11:52:05 -0800	[thread overview]
Message-ID: <Zykl5Xk8u5nhe0Pm@google.com> (raw)
In-Reply-To: <536d5b91-f9ed-99b3-6c17-3d93bf451ffd@redhat.com>

Hello Michael,

On Fri, Nov 01, 2024 at 11:15:39AM +0100, Michael Petlan wrote:
> 
> On Fri, 19 Jul 2024, Michael Petlan wrote:
> > On Fri, 19 Jul 2024, Arnaldo Carvalho de Melo wrote:
> > > On Fri, Jul 19, 2024, 6:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
> > >       Hello Namhyung,
> > > 
> > >       we were investigating some test failures of the testcase mentioned
> > >       in $subj. We have narrowed it down to:
> > > 
> > >           # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
> > > 
> > >           Performance counter stats for 'CPU(s) 0,1':
> > >                <not counted>      cycles                           system.slice
> > >                3,020,401,084      cycles                           user.slice                       
> > > 
> > >                1.009787097 seconds time elapsed
> > > 
> > >       As seen, the system.slice is not counted properly in our case. It
> > >       happens even without bpf-counters being involved.
> > > 
> > >       There were rumours that it might be caused due to too small system
> > >       load, but it apparently happens even when the load was replaced by
> > >       "thloop" workload from perf-test's workload library. However, even
> > >       so, if the load was insufficient, we'd see a value – 0 instead of
> > >       "not counted". The "<not counted>" result is printed if the counter
> > >       wasn't properly enabled and running.
> > > 
> > >       Have you encountered this problem? What could cause it?
> > > 
> > > 
> > > What does running with -vvv says? Some inconclusive error coming from the kernel? 
> > 
> Hello!
> 
> We have been investigating this issue a bit more again and we have come
> to conclusion that everything is probably OK, except of the testcase
> which in short relies on the fact that taskset can force any system.slice
> workload to happen on a particular CPU, which in my opinion does not
> apply, being rather random and that's why the test sometimes fails.
> 
> To summarize the problem a bit:
> 
> 1) The $subj testcase sometimes fails.
> 
> 2) It consists of two parts, one performs counting system-wide and the
> second limits the counting to CPUs 0 and 1. The second one sometimes
> fails, while the first (systemwide) passes always.
> 
> 3) The reason why the test fails is because system.slice may get
> <not counted> result.
> 
> 4) There is another problem with this testcase on single-cpu boxes, since
> there is no "cpu 1", so we decided to try having "-C 0" and "taskset -c 0"
> on such boxes. The problems with getting "<not counted>" disappeared!
> 
> ---------------------------
> 
> So... The systemwide tracing test works:
> 
> # perf stat --for-each-cgroup system.slice,user.slice -e cycles -a -- sleep 3
>  Performance counter stats for 'system wide':
> 
>      8,884,593      cycles            system.slice
>      5,645,624      cycles            user.slice
> 
>    3.004137451 seconds time elapsed
> 
> When we pin the workload AND tracing to particular CPU, it might fail:
> 
> # perf stat -C 0 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 0 true
> 
>  Performance counter stats for 'CPU(s) 0':
> 
>      <not counted>  cycles            system.slice
>      2,722,263      cycles            user.slice
> 
>    0.004184686 seconds time elapsed
> 
> Namhyung said that there might be not enough load, which finally appears
> to be the problem. But not in the manner that replacing `true` by something
> more "heavy" would help, but in the fact that system.slice didn't run on
> CPU 0 at all during the `perf stat` counting.
> 
> Taskset can pin the process to some CPU, but even without it (or when we
> pin it to CPU 3 for example), _some_ user.slice content is always run on
> cpu 0, so we get values.
> 
> However, there is no guarantee that system.slice will run there. We would
> probably need to load more the content that systemd decides to put under
> "system.slice" and hope that it will get a chance to run on CPU 0 or 1 or
> whatever we use in the testcase.
> 
> Of course, the more CPUs the machine has, the higher chance to get the
> <not counted> result for system.slice is. That's why -a works and also
> that's why it works on a single-CPU machine.
> 
> ............
> 
> Thus, I think that we should simply remove the taskset part of the testcase
> and leave only the systemwide part.
> 
> Thoughts?

Thanks for taking a look at this.  I admit that the CPU list can make
troubles.  We need a better cgroup testing infra.  Anyway let's get rid
of the problematic test for now.

Thanks,
Namhyung

next prev parent reply	other threads:[~2024-11-04 19:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-19  9:50 perf test fail :: "perf stat --bpf-counters --for-each-cgroup test" Michael Petlan
     [not found] ` <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>
2024-07-19 11:05   ` Michael Petlan
2024-11-01 10:15     ` Michael Petlan
2024-11-04 19:52       ` Namhyung Kim [this message]
2024-07-20  0:30 ` Namhyung Kim
2024-07-23  9:36   ` Michael Petlan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zykl5Xk8u5nhe0Pm@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@redhat.com \
    --cc=arnaldo.melo@gmail.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mpetlan@redhat.com \
    --cc=vmolnaro@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).