Re: [PATCH v3 03/46] perf stat: Introduce skippable evsels

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ahmad Yasin <ahmad.yasin@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Andi Kleen <ak@linux.intel.com>,
	Perry Taylor <perry.taylor@intel.com>,
	Samantha Alt <samantha.alt@intel.com>,
	Caleb Biggers <caleb.biggers@intel.com>,
	Weilin Wang <weilin.wang@intel.com>,
	Edward Baker <edward.baker@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Florian Fischer <florian.fischer@muhq.space>,
	Rob Herring <robh@kernel.org>,
	Zhengjun Xing <zhengjun.xing@linux.intel.com>,
	John Garry <john.g.garry@oracle.com>,
	Kajol Jain <kjain@linux.ibm.com>,
	Sumanth Korikkar <sumanthk@linux.ibm.com>,
	Thomas Richter <tmricht@linux.ibm.com>,
	Tiezhu Yang <yangtiezhu@loongson.cn>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Leo Yan <leo.yan@linaro.org>,
	Yang Jihong <yangjihong1@huawei.com>,
	James Clark <james.clark@arm.com>,
	Suzuki Poulouse <suzuki.poulose@arm.com>,
	Kang Minchul <tegongkang@gmail.com>,
	Athira Rajeev <atrajeev@linux.vnet.ibm.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 03/46] perf stat: Introduce skippable evsels
Date: Mon, 1 May 2023 10:56:11 -0400	[thread overview]
Message-ID: <d6784858-2a5f-7920-f1ac-d7ec9ed89605@linux.intel.com> (raw)
In-Reply-To: <20230429053506.1962559-4-irogers@google.com>



On 2023-04-29 1:34 a.m., Ian Rogers wrote:
> Perf stat with no arguments will use default events and metrics. These
> events may fail to open even with kernel and hypervisor disabled. When
> these fail then the permissions error appears even though they were
> implicitly selected. This is particularly a problem with the automatic
> selection of the TopdownL1 metric group on certain architectures like
> Skylake:
> 
> '''
> $ perf stat true
> Error:
> Access to performance monitoring and observability operations is limited.
> Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> access to performance monitoring and observability operations for processes
> without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> More information can be found at 'Perf events and tool security' document:
> https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> perf_event_paranoid setting is 2:
>   -1: Allow use of (almost) all events by all users
>       Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>> = 0: Disallow raw and ftrace function tracepoint access
>> = 1: Disallow CPU event access
>> = 2: Disallow kernel profiling
> To make the adjusted perf_event_paranoid setting permanent preserve it
> in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> '''
> 
> This patch adds skippable evsels that when they fail to open won't
> cause termination and will appear as "<not supported>" in output. The
> TopdownL1 events, from the metric group, are marked as skippable. This
> turns the failure above to:
> 
> '''
> $ perf stat perf bench internals synthesize
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
>   Average synthesis took: 49.287 usec (+- 0.083 usec)
>   Average num. events: 3.000 (+- 0.000)
>   Average time per event 16.429 usec
>   Average data synthesis took: 49.641 usec (+- 0.085 usec)
>   Average num. events: 11.000 (+- 0.000)
>   Average time per event 4.513 usec
> 
>  Performance counter stats for 'perf bench internals synthesize':
> 
>           1,222.38 msec task-clock:u                     #    0.993 CPUs utilized
>                  0      context-switches:u               #    0.000 /sec
>                  0      cpu-migrations:u                 #    0.000 /sec
>                162      page-faults:u                    #  132.529 /sec
>        774,445,184      cycles:u                         #    0.634 GHz                         (49.61%)
>      1,640,969,811      instructions:u                   #    2.12  insn per cycle              (59.67%)
>        302,052,148      branches:u                       #  247.102 M/sec                       (59.69%)
>          1,807,718      branch-misses:u                  #    0.60% of all branches             (59.68%)
>          5,218,927      CPU_CLK_UNHALTED.REF_XCLK:u      #    4.269 M/sec
>                                                   #     17.3 %  tma_frontend_bound
>                                                   #     56.4 %  tma_retiring
>                                                   #      nan %  tma_backend_bound
>                                                   #      nan %  tma_bad_speculation      (60.01%)
>        536,580,469      IDQ_UOPS_NOT_DELIVERED.CORE:u    #  438.965 M/sec                       (60.33%)
>    <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
>          5,223,936      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u #    4.274 M/sec                       (40.31%)
>        774,127,250      CPU_CLK_UNHALTED.THREAD:u        #  633.297 M/sec                       (50.34%)
>      1,746,579,518      UOPS_RETIRED.RETIRE_SLOTS:u      #    1.429 G/sec                       (50.12%)
>      1,940,625,702      UOPS_ISSUED.ANY:u                #    1.588 G/sec                       (49.70%)
> 
>        1.231055525 seconds time elapsed
> 
>        0.258327000 seconds user
>        0.965749000 seconds sys


Which branch is this patch series based on?

I still cannot get the same output as the examples.

I'm using the latest perf-tools-next (The latest commit ID is
5d27a645f609 ("perf tracepoint: Fix memory leak in is_valid_tracepoint()")).
I only applied patch 2 and patch 3, since the patch 1 is already merged.

It's a single socket Cascade Lake. with kernel 5.19-8.
$ uname -r
5.19.8-100.fc35.x86_64

As you can see, all the topdown related events are displayed twice.

With root permission,

$ sudo ./perf stat perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 91.487 usec (+- 0.050 usec)
  Average num. events: 47.000 (+- 0.000)
  Average time per event 1.947 usec
  Average data synthesis took: 97.720 usec (+- 0.059 usec)
  Average num. events: 245.000 (+- 0.000)
  Average time per event 0.399 usec

 Performance counter stats for 'perf bench internals synthesize':

          2,077.81 msec task-clock                       #    0.998 CPUs
utilized
               466      context-switches                 #  224.274 /sec
                 4      cpu-migrations                   #    1.925 /sec
               775      page-faults                      #  372.988 /sec
     9,561,957,326      cycles                           #    4.602 GHz
                       (31.17%)
    24,466,854,021      instructions                     #    2.56  insn
per cycle              (37.42%)
     5,547,892,196      branches                         #    2.670
G/sec                       (37.48%)
        37,880,526      branch-misses                    #    0.68% of
all branches             (37.52%)
        49,576,109      CPU_CLK_UNHALTED.REF_XCLK        #   23.860 M/sec
                                                  #     59.9 %  tma_retiring
                                                  #      4.6 %
tma_bad_speculation      (37.47%)
       228,406,003      INT_MISC.RECOVERY_CYCLES_ANY     #  109.926
M/sec                       (37.52%)
        49,591,815      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE #   23.867
M/sec                       (24.99%)
     9,553,472,893      CPU_CLK_UNHALTED.THREAD          #    4.598
G/sec                       (31.25%)
    22,893,372,651      UOPS_RETIRED.RETIRE_SLOTS        #   11.018
G/sec                       (31.23%)
    24,180,375,299      UOPS_ISSUED.ANY                  #   11.637
G/sec                       (31.25%)
        49,562,300      CPU_CLK_UNHALTED.REF_XCLK        #   23.853 M/sec
                                                  #     28.1 %
tma_frontend_bound
                                                  #      7.2 %
tma_backend_bound        (31.24%)
    10,735,205,084      IDQ_UOPS_NOT_DELIVERED.CORE      #    5.167
G/sec                       (31.30%)
       228,798,426      INT_MISC.RECOVERY_CYCLES_ANY     #  110.115
M/sec                       (25.04%)
        49,559,962      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE #   23.852
M/sec                       (25.00%)
     9,538,354,333      CPU_CLK_UNHALTED.THREAD          #    4.591
G/sec                       (31.29%)
    24,207,967,071      UOPS_ISSUED.ANY                  #   11.651
G/sec                       (31.24%)

       2.082670856 seconds time elapsed

       0.812763000 seconds user
       1.252387000 seconds sys


With non-root, nothing is counted for the topdownL1 events.

$ ./perf stat perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 91.852 usec (+- 0.139 usec)
  Average num. events: 47.000 (+- 0.000)
  Average time per event 1.954 usec
  Average data synthesis took: 96.230 usec (+- 0.046 usec)
  Average num. events: 245.000 (+- 0.000)
  Average time per event 0.393 usec

 Performance counter stats for 'perf bench internals synthesize':

          2,051.95 msec task-clock:u                     #    0.997 CPUs
utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               765      page-faults:u                    #  372.816 /sec
     3,601,662,523      cycles:u                         #    1.755 GHz
                       (16.72%)
     9,241,811,003      instructions:u                   #    2.57  insn
per cycle              (33.43%)
     2,238,848,485      branches:u                       #    1.091
G/sec                       (50.06%)
        19,966,181      branch-misses:u                  #    0.89% of
all branches             (66.77%)
     <not counted>      CPU_CLK_UNHALTED.REF_XCLK:u
   <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
     <not counted>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
     <not counted>      CPU_CLK_UNHALTED.THREAD:u
     <not counted>      UOPS_RETIRED.RETIRE_SLOTS:u
     <not counted>      UOPS_ISSUED.ANY:u
     <not counted>      CPU_CLK_UNHALTED.REF_XCLK:u
     <not counted>      IDQ_UOPS_NOT_DELIVERED.CORE:u
   <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
     <not counted>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
     <not counted>      CPU_CLK_UNHALTED.THREAD:u
     <not counted>      UOPS_ISSUED.ANY:u

       2.057691297 seconds time elapsed

       0.766640000 seconds user
       1.275170000 seconds sys


Thanks,
Kan

next prev parent reply	other threads:[~2023-05-01 14:56 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-29  5:34 [PATCH v3 00/46] Fix perf on Intel hybrid CPUs Ian Rogers
2023-04-29  5:34 ` [PATCH v3 01/46] perf stat: Disable TopdownL1 on hybrid Ian Rogers
2023-04-29  5:34 ` [PATCH v3 02/46] perf metric: Change divide by zero and !support events behavior Ian Rogers
2023-04-29  5:34 ` [PATCH v3 03/46] perf stat: Introduce skippable evsels Ian Rogers
2023-05-01 14:56   ` Liang, Kan [this message]
2023-05-01 15:29     ` Ian Rogers
2023-05-01 20:25       ` Liang, Kan
2023-05-01 20:48         ` Ian Rogers
2023-05-01 23:34           ` Liang, Kan
2023-04-29  5:34 ` [PATCH v3 05/46] perf parse-events: Don't reorder ungrouped events by pmu Ian Rogers
2023-04-29  5:34 ` [PATCH v3 06/46] perf vendor events intel: Add alderlake metric constraints Ian Rogers
2023-04-29  5:34 ` [PATCH v3 07/46] perf vendor events intel: Add icelake " Ian Rogers
2023-04-29  5:34 ` [PATCH v3 08/46] perf vendor events intel: Add icelakex " Ian Rogers
2023-04-29  5:34 ` [PATCH v3 09/46] perf vendor events intel: Add sapphirerapids " Ian Rogers
2023-04-29  5:34 ` [PATCH v3 10/46] perf vendor events intel: Add tigerlake " Ian Rogers
2023-04-29  5:34 ` [PATCH v3 11/46] perf stat: Avoid segv on counter->name Ian Rogers
2023-04-29  5:34 ` [PATCH v3 12/46] perf test: Test more sysfs events Ian Rogers
2023-05-02 10:27   ` Ravi Bangoria
2023-05-02 15:16     ` Ian Rogers
2023-05-02 15:29       ` Ian Rogers
2023-04-29  5:34 ` [PATCH v3 13/46] perf test: Use valid for PMU tests Ian Rogers
2023-04-29  5:34 ` [PATCH v3 14/46] perf test: Mask config then test Ian Rogers
2023-05-02 10:44   ` Ravi Bangoria
2023-05-02 16:19     ` Ian Rogers
2023-04-29  5:34 ` [PATCH v3 15/46] perf test: Test more with config_cache Ian Rogers
2023-04-29  5:34 ` [PATCH v3 16/46] perf test: Roundtrip name, don't assume 1 event per name Ian Rogers
2023-04-29  5:34 ` [PATCH v3 17/46] perf parse-events: Set attr.type to PMU type early Ian Rogers
2023-04-29  5:34 ` [PATCH v3 18/46] perf parse-events: Set pmu_name whenever a pmu is given Ian Rogers
2023-04-29  5:34 ` [PATCH v3 19/46] perf print-events: Avoid unnecessary strlist Ian Rogers
2023-04-29  5:34 ` [PATCH v3 20/46] perf parse-events: Avoid scanning PMUs before parsing Ian Rogers
2023-04-29  5:34 ` [PATCH v3 21/46] perf evsel: Modify group pmu name for software events Ian Rogers
2023-04-29  5:34 ` [PATCH v3 22/46] perf test: Move x86 hybrid tests to arch/x86 Ian Rogers
2023-04-29  5:34 ` [PATCH v3 23/46] perf test x86 hybrid: Update test expectations Ian Rogers
2023-04-29  5:34 ` [PATCH v3 24/46] perf test x86 hybrid: Add hybrid extended type checks Ian Rogers
2023-04-29  5:34 ` [PATCH v3 25/46] perf parse-events: Support PMUs for legacy cache events Ian Rogers
2023-04-29  5:34 ` [PATCH v3 26/46] perf parse-events: Wildcard " Ian Rogers
2023-04-29  5:34 ` [PATCH v3 27/46] perf print-events: Print legacy cache events for each PMU Ian Rogers
2023-05-02 10:48   ` Ravi Bangoria
2023-05-02 17:40     ` Ian Rogers
2023-04-29  5:34 ` [PATCH v3 28/46] perf parse-events: Support wildcards on raw events Ian Rogers
2023-04-29  5:34 ` [PATCH v3 29/46] perf parse-events: Remove now unused hybrid logic Ian Rogers
2023-04-29  5:34 ` [PATCH v3 30/46] perf parse-events: Minor type safety cleanup Ian Rogers
2023-04-29  5:34 ` [PATCH v3 31/46] perf parse-events: Add pmu filter Ian Rogers
2023-04-29  5:34 ` [PATCH v3 32/46] perf stat: Make cputype filter generic Ian Rogers
2023-05-02 10:51   ` Ravi Bangoria
2023-05-02 20:09     ` Ian Rogers
2023-05-02 20:16     ` Ian Rogers
2023-04-29  5:34 ` [PATCH v3 33/46] perf test: Add cputype testing to perf stat Ian Rogers
2023-04-29  5:34 ` [PATCH v3 34/46] perf test: Fix parse-events tests for >1 core PMU Ian Rogers
2023-04-29  5:34 ` [PATCH v3 35/46] perf parse-events: Support hardware events as terms Ian Rogers
2023-05-02 10:55   ` Ravi Bangoria
2023-05-02 17:57     ` Ian Rogers
2023-04-29  5:34 ` [PATCH v3 36/46] perf parse-events: Avoid error when assigning a term Ian Rogers
2023-04-29  5:34 ` [PATCH v3 37/46] perf parse-events: Avoid error when assigning a legacy cache term Ian Rogers
2023-04-29  5:34 ` [PATCH v3 38/46] perf parse-events: Don't auto merge hybrid wildcard events Ian Rogers
2023-04-29  5:34 ` [PATCH v3 39/46] perf parse-events: Don't reorder atom cpu events Ian Rogers
2023-04-29  5:35 ` [PATCH v3 40/46] perf metrics: Be PMU specific for referenced metrics Ian Rogers
2023-04-29  5:35 ` [PATCH v3 41/46] perf stat: Command line PMU metric filtering Ian Rogers
2023-04-29  5:35 ` [PATCH v3 42/46] perf vendor events intel: Correct alderlake metrics Ian Rogers
2023-04-29  5:35 ` [PATCH v3 43/46] perf jevents: Don't rewrite metrics across PMUs Ian Rogers
2023-04-29  5:35 ` [PATCH v3 44/46] perf metrics: Be PMU specific in event match Ian Rogers
2023-04-29  5:35 ` [PATCH v3 45/46] perf stat: Don't disable TopdownL1 metric on hybrid Ian Rogers
2023-04-29  5:35 ` [PATCH v3 46/46] perf parse-events: Reduce scope of is_event_supported Ian Rogers
2023-05-01 20:34 ` [PATCH v3 00/46] Fix perf on Intel hybrid CPUs Liang, Kan
2023-05-01 20:51   ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6784858-2a5f-7920-f1ac-d7ec9ed89605@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ahmad.yasin@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=atrajeev@linux.vnet.ibm.com \
    --cc=caleb.biggers@intel.com \
    --cc=edward.baker@intel.com \
    --cc=eranian@google.com \
    --cc=florian.fischer@muhq.space \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=john.g.garry@oracle.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=perry.taylor@intel.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=robh@kernel.org \
    --cc=samantha.alt@intel.com \
    --cc=sumanthk@linux.ibm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tegongkang@gmail.com \
    --cc=tmricht@linux.ibm.com \
    --cc=weilin.wang@intel.com \
    --cc=yangjihong1@huawei.com \
    --cc=yangtiezhu@loongson.cn \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).