From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 376C1C83F12 for ; Tue, 29 Aug 2023 01:43:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234939AbjH2BlP (ORCPT ); Mon, 28 Aug 2023 21:41:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233019AbjH2Bkt (ORCPT ); Mon, 28 Aug 2023 21:40:49 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75D29184; Mon, 28 Aug 2023 18:40:45 -0700 (PDT) Received: from kwepemm600003.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4RZVT64CS0z1L9Dh; Tue, 29 Aug 2023 09:39:06 +0800 (CST) Received: from [10.67.111.205] (10.67.111.205) by kwepemm600003.china.huawei.com (7.193.23.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Tue, 29 Aug 2023 09:40:42 +0800 Subject: Re: [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs To: Adrian Hunter , Namhyung Kim CC: , , , , , , , , , , , , , References: <20230826032608.107261-1-yangjihong1@huawei.com> <20230826032608.107261-5-yangjihong1@huawei.com> <0bf5f881-f264-bf2e-431e-444f51c97ee0@intel.com> From: Yang Jihong Message-ID: Date: Tue, 29 Aug 2023 09:40:42 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: <0bf5f881-f264-bf2e-431e-444f51c97ee0@intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.111.205] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600003.china.huawei.com (7.193.23.202) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-perf-users@vger.kernel.org Hello, On 2023/8/28 19:25, Adrian Hunter wrote: > On 28/08/23 06:03, Yang Jihong wrote: >> Hello, >> >> On 2023/8/26 22:59, Namhyung Kim wrote: >>> Hello, >>> >>> On Fri, Aug 25, 2023 at 8:29 PM Yang Jihong wrote: >>>> >>>> User space tasks can migrate between CPUs, we need to track side-band >>>> events for all CPUs. >>>> >>>> The specific scenarios are as follows: >>>> >>>>           CPU0                                 CPU1 >>>>    perf record -C 0 start >>>>                                taskA starts to be created and executed >>>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP >>>>                                     events only deliver to CPU1 >>>>                                ...... >>>>                                  | >>>>                            migrate to CPU0 >>>>                                  | >>>>    Running on CPU0    <----------/ >>>>    ... >>>> >>>>    perf record -C 0 stop >>>> >>>> Now perf samples the PC of taskA. However, perf does not record the >>>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA. >>>> Therefore, the comm and symbols of taskA cannot be parsed. >>>> >>>> The solution is to record sideband events for all CPUs when tracing >>>> selected CPUs. Because this modifies the default behavior, add related >>>> comments to the perf record man page. >>>> >>>> The sys_perf_event_open invoked is as follows: >>>> >>>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true >>>>    >>>>    Opening: cpu-clock >>>>    ------------------------------------------------------------ >>>>    perf_event_attr: >>>>      type                             1 (PERF_TYPE_SOFTWARE) >>>>      size                             136 >>>>      config                           0 (PERF_COUNT_SW_CPU_CLOCK) >>>>      { sample_period, sample_freq }   4000 >>>>      sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER >>>>      read_format                      ID|LOST >>>>      disabled                         1 >>>>      inherit                          1 >>>>      freq                             1 >>>>      sample_id_all                    1 >>>>      exclude_guest                    1 >>>>    ------------------------------------------------------------ >>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5 >>>>    Opening: dummy:u >>>>    ------------------------------------------------------------ >>>>    perf_event_attr: >>>>      type                             1 (PERF_TYPE_SOFTWARE) >>>>      size                             136 >>>>      config                           0x9 (PERF_COUNT_SW_DUMMY) >>>>      { sample_period, sample_freq }   1 >>>>      sample_type                      IP|TID|TIME|CPU|IDENTIFIER >>>>      read_format                      ID|LOST >>>>      inherit                          1 >>>>      exclude_kernel                   1 >>>>      exclude_hv                       1 >>>>      mmap                             1 >>>>      comm                             1 >>>>      task                             1 >>>>      sample_id_all                    1 >>>>      exclude_guest                    1 >>>>      mmap2                            1 >>>>      comm_exec                        1 >>>>      ksymbol                          1 >>>>      bpf_event                        1 >>>>    ------------------------------------------------------------ >>>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6 >>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7 >>>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9 >>>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10 >>>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11 >>>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12 >>>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13 >>>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14 >>>>    >>>> >>>> Signed-off-by: Yang Jihong >>>> Acked-by: Adrian Hunter >>>> --- >>>>   tools/perf/Documentation/perf-record.txt |  3 ++ >>>>   tools/perf/builtin-record.c              | 44 +++++++++++++++++++++++- >>>>   2 files changed, 46 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt >>>> index d5217be012d7..1889f66addf2 100644 >>>> --- a/tools/perf/Documentation/perf-record.txt >>>> +++ b/tools/perf/Documentation/perf-record.txt >>>> @@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0- >>>>   In per-thread mode with inheritance mode on (default), samples are captured only when >>>>   the thread executes on the designated CPUs. Default is to monitor all CPUs. >>>> >>>> +User space tasks can migrate between CPUs, so when tracing selected CPUs, >>>> +a dummy event is created to track sideband for all CPUs. >>>> + >>>>   -B:: >>>>   --no-buildid:: >>>>   Do not save the build ids of binaries in the perf.data files. This skips >>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c >>>> index 83bd1f117191..21c571018148 100644 >>>> --- a/tools/perf/builtin-record.c >>>> +++ b/tools/perf/builtin-record.c >>>> @@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec) >>>>          return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts); >>>>   } >>>> >>>> +static bool record__tracking_system_wide(struct record *rec) >>>> +{ >>>> +       struct record_opts *opts = &rec->opts; >>>> +       struct evlist *evlist = rec->evlist; >>>> +       struct evsel *evsel; >>>> + >>>> +       /* >>>> +        * If all (non-dummy) evsel have exclude_user, >>>> +        * system_wide is not needed. >>> >>> Maybe I missed some earlier discussion but why is it not >>> needed when exclude_user is set?  I think it still needs >>> FORK or COMM at least.. >>> >> >> This is Adrian's suggestion earlier, I think it's probably because if exclude_user is set, MMAP information is not needed, that's my guess. >> >> However, as you said, even if exclude_user is set, at least FORK and COMM are required. >> >> Therefore, the conditions here need to be changed to: >> "system_wide is need as long as there is a non-dummy event." >> >> @Adrian, is this change okay? > > If you wish. I think we use FORK to get mappings, so not sure it > would be needed. There is PID, TID so COMM is not essential. > There are FORK, COMM etc from the CPUs being traced of course. > I think COMM is also necessary. If there is no COMM, the perf report cannot display the comm of the process migrated from other core. The test result is as follows: # perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 33 of event 'cpu-clock:khppp' # Event count (approx.): 8250000 # # Overhead Command Shared Object Symbol # ........ ............... ................. ....................................... # 15.15% test3 [kernel.kallsyms] [k] finish_task_switch 9.09% kworker/1:0-eve [kernel.kallsyms] [k] _raw_spin_unlock_irq 3.03% :934 [kernel.kallsyms] [k] blk_done_softirq 3.03% :934 [kernel.kallsyms] [k] finish_task_switch Thanks, Yang