All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Kees Cook <keescook@chromium.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Petr Mladek <pmladek@suse.com>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	Qian Cai <cai@lca.pw>, Joe Lawrence <joe.lawrence@redhat.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Sri Krishna chowdary <schowdary@nvidia.com>,
	"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Changbin Du <changbin.du@intel.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Gary Hook <Gary.Hook@amd.com>, Arnd Bergmann <arnd@arndb.de>,
	linux-kernel@vger.kernel.org,
	Stephane Eranian <eranian@google.com>,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH v3 10/10] perf/cgroup: Do not switch system-wide events in cgroup switch
Date: Thu, 14 Nov 2019 15:49:57 -0500	[thread overview]
Message-ID: <94c8c876-f236-7052-24ef-536f6870a8d5@linux.intel.com> (raw)
In-Reply-To: <4bc51bf9-1d47-063a-e811-d05fb42c8838@linux.intel.com>



On 11/14/2019 10:24 AM, Liang, Kan wrote:
> 
> 
> On 11/14/2019 10:16 AM, Liang, Kan wrote:
>>
>>
>> On 11/14/2019 8:57 AM, Peter Zijlstra wrote:
>>> On Thu, Nov 14, 2019 at 08:46:51AM -0500, Liang, Kan wrote:
>>>>
>>>>
>>>> On 11/14/2019 5:43 AM, Peter Zijlstra wrote:
>>>>> On Wed, Nov 13, 2019 at 04:30:42PM -0800, Ian Rogers wrote:
>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>
>>>>>> When counting system-wide events and cgroup events simultaneously, 
>>>>>> the
>>>>>> system-wide events are always scheduled out then back in during 
>>>>>> cgroup
>>>>>> switches, bringing extra overhead and possibly missing events. 
>>>>>> Switching
>>>>>> out system wide flexible events may be necessary if the scheduled in
>>>>>> task's cgroups have pinned events that need to be scheduled in at 
>>>>>> a higher
>>>>>> priority than the system wide flexible events.
>>>>>
>>>>> I'm thinking this patch is actively broken. groups->index 'group' wide
>>>>> and therefore across cpu/cgroup boundaries.
>>>>>
>>>>> There is no !cgroup to cgroup hierarchy as this patch seems to assume,
>>>>> specifically look at how the merge sort in visit_groups_merge() allows
>>>>> cgroup events to be picked before !cgroup events.
>>>>
>>>>
>>>> No, the patch intends to avoid switch !cgroup during cgroup context 
>>>> switch.
>>>
>>> Which is wrong.
>>>
>> Why we want to switch !cgroup system-wide event in context switch?
>>
>> How should current perf handle this case?
> 

It seems hard to find a simple case to explain why we should not switch 
!cgroup during cgroup context switch.

Let me try to explain it using ftrace.

Case 1:
User A do system-wide monitoring for 1 second. No other users.
      #perf stat -e branches -a -- sleep 1

The counter counts between 765531.617703 and 765532.620184.
Everything is collected.

            <...>-59160 [027] d.h. 765531.617697: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
            <...>-59160 [027] d.h. 765531.617701: write_msr: 
MSR_IA32_PMC0(4c1), value 800000000001
            <...>-59160 [027] d.h. 765531.617702: write_msr: 
MSR_P6_EVNTSEL0(186), value 5300c4
            <...>-59160 [027] d.h. 765531.617703: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
           <idle>-0     [027] d.h. 765532.620184: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
           <idle>-0     [027] d.h. 765532.620185: write_msr: 
MSR_P6_EVNTSEL0(186), value 1300c4
           <idle>-0     [027] d.h. 765532.620186: rdpmc: 0, value 
80000b3e87a4
           <idle>-0     [027] d.h. 765532.620187: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f


Case 2:
User A do system-wide monitoring for 1 second.
      #perf stat -e branches -a -- sleep 1
At the meantime, User B do cgroup monitoring.
      #perf stat -e cycles -G cgroup

The User A expects to collect everything from 765580.196521 to 
765581.198150. But it doesn't.

Because of cgroup context switch, the system-wide event for user A stops 
counting at [765580.213882, 765580.213884],
[765580.213913, 765580.213915], ..., [765580.774304, 765580.774307].

I think it breaks the usage of User A.

Furthermore, switching !cgroup system-wide event also brings extra 
overhead, which is unnecessary.

            <...>-121292 [027] d.h. 765580.196514: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
            <...>-121292 [027] d.h. 765580.196519: write_msr: 
MSR_IA32_PMC0(4c1), value 800000000001
            <...>-121292 [027] d.h. 765580.196520: write_msr: 
MSR_P6_EVNTSEL0(186), value 5300c4
            <...>-121292 [027] d.h. 765580.196521: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
           <idle>-0     [027] d... 765580.213878: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
           <idle>-0     [027] d... 765580.213880: write_msr: 
MSR_P6_EVNTSEL0(186), value 1300c4
           <idle>-0     [027] d... 765580.213880: rdpmc: 0, value 
800000357bc1
           <idle>-0     [027] d... 765580.213882: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
      simics-poll-25601 [027] d... 765580.213884: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
      simics-poll-25601 [027] d... 765580.213888: write_msr: 
MSR_CORE_PERF_FIXED_CTR1(30a), value 800015820cbe
      simics-poll-25601 [027] d... 765580.213889: read_msr: 
MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value 0
      simics-poll-25601 [027] d... 765580.213890: write_msr: 
MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value b0
      simics-poll-25601 [027] d... 765580.213890: write_msr: 
MSR_IA32_PMC0(4c1), value 800000357bc1
      simics-poll-25601 [027] d... 765580.213891: write_msr: 
MSR_P6_EVNTSEL0(186), value 5300c4
      simics-poll-25601 [027] d... 765580.213892: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
      simics-poll-25601 [027] d... 765580.213910: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
      simics-poll-25601 [027] d... 765580.213911: read_msr: 
MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value b0
      simics-poll-25601 [027] d... 765580.213911: write_msr: 
MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value 0
      simics-poll-25601 [027] d... 765580.213911: rdpmc: 40000001, value 
80001582b676
      simics-poll-25601 [027] d... 765580.213912: write_msr: 
MSR_P6_EVNTSEL0(186), value 1300c4
      simics-poll-25601 [027] d... 765580.213913: rdpmc: 0, value 
800000358491
      simics-poll-25601 [027] d... 765580.213913: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
           <idle>-0     [027] d... 765580.213915: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
           <idle>-0     [027] d... 765580.213916: write_msr: 
MSR_IA32_PMC0(4c1), value 800000358491
           <idle>-0     [027] d... 765580.213916: write_msr: 
MSR_P6_EVNTSEL0(186), value 5300c4
           <idle>-0     [027] d... 765580.213917: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f

... ...

      simics-poll-25601 [027] d... 765580.774301: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
      simics-poll-25601 [027] d... 765580.774302: read_msr: 
MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value b0
      simics-poll-25601 [027] d... 765580.774302: write_msr: 
MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value 0
      simics-poll-25601 [027] d... 765580.774302: rdpmc: 40000001, value 
8000165e927b
      simics-poll-25601 [027] d... 765580.774303: write_msr: 
MSR_P6_EVNTSEL0(186), value 1300c4
      simics-poll-25601 [027] d... 765580.774303: rdpmc: 0, value 
8000059298ce
      simics-poll-25601 [027] d... 765580.774304: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
            <...>-135379 [027] d... 765580.774307: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
            <...>-135379 [027] d... 765580.774308: write_msr: 
MSR_IA32_PMC0(4c1), value 8000059298ce
            <...>-135379 [027] d... 765580.774309: write_msr: 
MSR_P6_EVNTSEL0(186), value 5300c4
            <...>-135379 [027] d... 765580.774309: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
            <...>-147127 [027] d.h. 765581.198150: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
            <...>-147127 [027] d.h. 765581.198153: write_msr: 
MSR_P6_EVNTSEL0(186), value 1300c4
            <...>-147127 [027] d.h. 765581.198153: rdpmc: 0, value 
80000a573368
            <...>-147127 [027] d.h. 765581.198155: write_msr: 
MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f


Thanks,
Kan






  reply	other threads:[~2019-11-14 20:50 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-14  0:30 [PATCH v3 00/10] Optimize cgroup context switch Ian Rogers
2019-11-14  0:30 ` [PATCH v3 01/10] perf/cgroup: Reorder perf_cgroup_connect() Ian Rogers
2019-11-14  8:50   ` Peter Zijlstra
2019-11-14  0:30 ` [PATCH v3 02/10] lib: introduce generic min max heap Ian Rogers
2019-11-14  9:32   ` Peter Zijlstra
2019-11-14  9:35   ` Peter Zijlstra
2019-11-17 18:28   ` Joe Perches
2019-11-18  8:40     ` Peter Zijlstra
2019-11-18 11:50       ` Joe Perches
2019-11-18 12:21         ` Peter Zijlstra
2019-11-14  0:30 ` [PATCH v3 03/10] perf: Use min_max_heap in visit_groups_merge Ian Rogers
2019-11-14  9:39   ` Peter Zijlstra
2019-11-14  0:30 ` [PATCH v3 04/10] perf: Add per perf_cpu_context min_heap storage Ian Rogers
2019-11-14  9:51   ` Peter Zijlstra
2019-11-16  1:19     ` Ian Rogers
2019-11-14  0:30 ` [PATCH v3 05/10] perf/cgroup: Grow per perf_cpu_context heap storage Ian Rogers
2019-11-14  9:54   ` Peter Zijlstra
2019-11-14  0:30 ` [PATCH v3 06/10] perf/cgroup: Order events in RB tree by cgroup id Ian Rogers
2019-11-14  0:30 ` [PATCH v3 07/10] perf: simplify and rename visit_groups_merge Ian Rogers
2019-11-14 10:03   ` Peter Zijlstra
2019-11-16  1:20     ` Ian Rogers
2019-11-14  0:30 ` [PATCH v3 08/10] perf: cache perf_event_groups_first for cgroups Ian Rogers
2019-11-14 10:25   ` Peter Zijlstra
2019-11-16  1:20     ` Ian Rogers
2019-11-18  8:37       ` Peter Zijlstra
2019-11-14  0:30 ` [PATCH v3 09/10] perf: optimize event_filter_match during sched_in Ian Rogers
2019-11-14  0:30 ` [PATCH v3 10/10] perf/cgroup: Do not switch system-wide events in cgroup switch Ian Rogers
2019-11-14 10:43   ` Peter Zijlstra
2019-11-14 13:46     ` Liang, Kan
2019-11-14 13:57       ` Peter Zijlstra
2019-11-14 15:16         ` Liang, Kan
2019-11-14 15:24           ` Liang, Kan
2019-11-14 20:49             ` Liang, Kan [this message]
2019-11-14  0:42 ` [PATCH v3 00/10] Optimize cgroup context switch Ian Rogers
2019-11-14 10:45 ` Peter Zijlstra
2019-11-14 18:17   ` Ian Rogers
2019-12-06 23:16     ` Ian Rogers
2019-11-16  1:18 ` [PATCH v4 " Ian Rogers
2019-11-16  1:18   ` [PATCH v4 01/10] perf/cgroup: Reorder perf_cgroup_connect() Ian Rogers
2019-11-16  1:18   ` [PATCH v4 02/10] lib: introduce generic min max heap Ian Rogers
2019-11-21 11:11     ` Joe Perches
2019-11-16  1:18   ` [PATCH v4 03/10] perf: Use min_max_heap in visit_groups_merge Ian Rogers
2019-11-16  1:18   ` [PATCH v4 04/10] perf: Add per perf_cpu_context min_heap storage Ian Rogers
2019-11-16  1:18   ` [PATCH v4 05/10] perf/cgroup: Grow per perf_cpu_context heap storage Ian Rogers
2019-11-16  1:18   ` [PATCH v4 06/10] perf/cgroup: Order events in RB tree by cgroup id Ian Rogers
2019-11-16  1:18   ` [PATCH v4 07/10] perf: simplify and rename visit_groups_merge Ian Rogers
2019-11-16  1:18   ` [PATCH v4 08/10] perf: cache perf_event_groups_first for cgroups Ian Rogers
2019-11-16  1:18   ` [PATCH v4 09/10] perf: optimize event_filter_match during sched_in Ian Rogers
2019-11-16  1:18   ` [PATCH v4 10/10] perf/cgroup: Do not switch system-wide events in cgroup switch Ian Rogers
2019-12-06 23:15   ` [PATCH v5 00/10] Optimize cgroup context switch Ian Rogers
2019-12-06 23:15     ` [PATCH v5 01/10] perf/cgroup: Reorder perf_cgroup_connect() Ian Rogers
2019-12-06 23:15     ` [PATCH v5 02/10] lib: introduce generic min-heap Ian Rogers
2019-12-06 23:15     ` [PATCH v5 03/10] perf: Use min_max_heap in visit_groups_merge Ian Rogers
2019-12-08  7:10       ` kbuild test robot
2019-12-08  7:10         ` kbuild test robot
2019-12-06 23:15     ` [PATCH v5 04/10] perf: Add per perf_cpu_context min_heap storage Ian Rogers
2019-12-06 23:15     ` [PATCH v5 05/10] perf/cgroup: Grow per perf_cpu_context heap storage Ian Rogers
2019-12-06 23:15     ` [PATCH v5 06/10] perf/cgroup: Order events in RB tree by cgroup id Ian Rogers
2019-12-06 23:15     ` [PATCH v5 07/10] perf: simplify and rename visit_groups_merge Ian Rogers
2019-12-06 23:15     ` [PATCH v5 08/10] perf: cache perf_event_groups_first for cgroups Ian Rogers
2019-12-06 23:15     ` [PATCH v5 09/10] perf: optimize event_filter_match during sched_in Ian Rogers
2019-12-06 23:15     ` [PATCH v5 10/10] perf/cgroup: Do not switch system-wide events in cgroup switch Ian Rogers
2020-02-14  7:51     ` [PATCH v6 0/6] Optimize cgroup context switch Ian Rogers
2020-02-14  7:51       ` [PATCH v6 1/6] perf/cgroup: Reorder perf_cgroup_connect() Ian Rogers
2020-02-14 16:11         ` Shuah Khan
2020-02-14 17:37           ` Peter Zijlstra
2020-03-06 14:42         ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2020-02-14  7:51       ` [PATCH v6 2/6] lib: introduce generic min-heap Ian Rogers
2020-02-14 22:06         ` Randy Dunlap
2020-02-17 16:29         ` Peter Zijlstra
2020-03-06 14:42         ` [tip: perf/core] lib: Introduce " tip-bot2 for Ian Rogers
2020-02-14  7:51       ` [PATCH v6 3/6] perf: Use min_heap in visit_groups_merge Ian Rogers
2020-02-17 17:23         ` Peter Zijlstra
2020-03-06 14:42         ` [tip: perf/core] perf/core: Use min_heap in visit_groups_merge() tip-bot2 for Ian Rogers
2020-02-14  7:51       ` [PATCH v6 4/6] perf: Add per perf_cpu_context min_heap storage Ian Rogers
2020-03-06 14:42         ` [tip: perf/core] perf/core: " tip-bot2 for Ian Rogers
2020-02-14  7:51       ` [PATCH v6 5/6] perf/cgroup: Grow per perf_cpu_context heap storage Ian Rogers
2020-03-06 14:42         ` [tip: perf/core] " tip-bot2 for Ian Rogers
2020-02-14  7:51       ` [PATCH v6 6/6] perf/cgroup: Order events in RB tree by cgroup id Ian Rogers
2020-02-14 19:32       ` [PATCH v6 0/6] Optimize cgroup context switch Ian Rogers
2020-02-17 16:18       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94c8c876-f236-7052-24ef-536f6870a8d5@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=Gary.Hook@amd.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=cai@lca.pw \
    --cc=catalin.marinas@arm.com \
    --cc=changbin.du@intel.com \
    --cc=davem@davemloft.net \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=joe.lawrence@redhat.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mchehab+samsung@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=schowdary@nvidia.com \
    --cc=urezki@gmail.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.